Re: Bulk loading data

2003-07-09 Thread Paul Chvostek

Can't use REPLACE.  The table contains a little more data than what's
supplied in the bi-weekly logfile.  The REPLACE command will delete the
old record before inserting its replacement, which in this case would
lose the data in the other fields.

p

On Wed, Jul 09, 2003 at 03:50:42PM +0200, Rudy Metzger wrote:
> 
> Paul,
> 
> Did you try using REPLACE instead of UPDATE/INSERT? Could give you some
> more speed enhancement.
> 
> Cheers
> /rudy
> 
> -Original Message-
> From: Paul Chvostek [mailto:[EMAIL PROTECTED] 
> Sent: woensdag 9 juli 2003 15:21
> To: cmars
> Cc: [EMAIL PROTECTED]
> Subject: Re: Bulk loading data
> 
> On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote:
> > 
> > I want to insert massive amounts of data in near real-time into a
> MySQL database.  The data is in a structured binary format, and I have
> code that parses the data into logical data structures.
> ...
> > How can I load data directly into the database at a lower more direct
> level than LOAD FILE?  What are my options to integrate my data parser
> with MySQL?
> 
> Other than patching mysqld, I can't see how you'd get an additional
> interface to the database files.
> 
> I have s similar (though certainly not identical) issue with data I'm
> dealing with -- about 4 million records twice a week.  Source data is in
> files with fixed-length fields with a newline after each record, and
> contains some data that must be updated, and some that must be inserted.
> (I.e. I receive replacements not diffs.)
> 
> My solution was lowbrow, but works well.  I pipe the source file through
> a huge honkin' sed script that grabs the fields and converts them into
> an INSERT statement, with sed's output piped through the mysql text
> client.  (sed 's/^(...)(..)()/ etc etc/')
> 
> Each record turns into two statements, first an UPDATE (which fails if
> the record doesn't exist), second an INSERT (which fails if the record
> does exist).  I obviously run the mysql client with the -f option.  This
> method lets me keep the database live while the update runs, whereas
> LOAD FILE would require that I flush the data before inserting.
> 
> Sure, I'm pushing far more goop through the pipe than the data I'm
> updating, but it's just a pipe, and the run happens at 4AM when nobody
> is watching.
> 
> My db server is a 400MHz P-I and my disks are slow, but it takes me over
> an hour to run the file so I'm not sure how this solution would fare a
> hundred or a thousand times a day

-- 
  Paul Chvostek <[EMAIL PROTECTED]>
  Operations / Abuse / Whatever
  it.canada, hosting and development   http://www.it.ca/


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bulk loading data

2003-07-09 Thread Paul DuBois
At 22:01 -0700 7/8/03, cmars wrote:
Paul,
I have used the ODBC interface as a rough test to experiment with 
application compatibility with MySQL.  I would not consider using 
ODBC at all to load in a production environment.

I want to develop a loading method that is as fast as possible.  If 
I were to use LOAD FILE, my application 1. parses the binary file, 
2. converts/outputs a text format, 3. MySQL parses this text format, 
and 4. converts to its own internal storage.  It seems to me that if 
I could simply eliminate step 2 and 3 the load time would decrease 
dramatically, especially considering the volume of data.  Disk I/O 
alone for writing and then reading such a large temp file seems to 
me to be a waste.  I have converted these files to text before and 
they can inflate 10x.. from 10M to 100M!
Before we go any further, I need to ask something.  Are you talking
about the LOAD_FILE() function, or are you really talking about the LOAD
DATA statement?
It sounds like you really mean LOAD DATA.  The LOAD_FILE() function
tells the server to read a file directly (which means the file must
be located on the server host), but it doesn't need to do all of that
messing around with converting the file to text.  Thus it should be
quite efficient.
I know I am taking a more difficult path, but load time must be 
extremely fast for my application.  How could I implement this 
functionality in my application, or extend MySQL to do it?

thanks,
casey
 -Original Message-
 From: Paul DuBois [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, July 08, 2003, 9:46 PM
 To: [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: Re: Bulk loading data
 At 21:36 -0700 7/8/03, cmars wrote:
 >Hi,
 >I want to insert massive amounts of data in near real-time into a
 >MySQL database.  The data is in a structured binary format, and I
 >have code that parses the data into logical data structures.
 >
 >Right now I can load into a MySQL database via ODBC, but I need to
 >improve throughput.  How can I bulk insert the data more efficiently?
 >
 >I would prefer not to use LOAD FILE because it involves converting
 >the binary format to text, which will have a much larger footprint.
 >To illustrate the pain of text files in my application, a single
 >binary file might contain up to a million rows!  And I might load
 >hundreds, if not thousands of these files in a day!
 >
 >How can I load data directly into the database at a lower more
 >direct level than LOAD FILE?  What are my options to integrate my
 >data parser with MySQL?
 I'm curious why you're concerned about the efficiency of LOAD FILE
 when you're using ODBC, an API known to add a fair amount of overhead.
 Have you actually tried LOAD FILE and found it empirically to be slower than
 ODBC?


 >
 >thanks,
 >casey
 --
 Paul DuBois, Senior Technical Writer
 Madison, Wisconsin, USA
 MySQL AB, www.mysql.com
 Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Bulk loading data

2003-07-09 Thread Rudy Metzger
Paul,

Did you try using REPLACE instead of UPDATE/INSERT? Could give you some
more speed enhancement.

Cheers
/rudy

-Original Message-
From: Paul Chvostek [mailto:[EMAIL PROTECTED] 
Sent: woensdag 9 juli 2003 15:21
To: cmars
Cc: [EMAIL PROTECTED]
Subject: Re: Bulk loading data

On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote:
> 
> I want to insert massive amounts of data in near real-time into a
MySQL database.  The data is in a structured binary format, and I have
code that parses the data into logical data structures.
...
> How can I load data directly into the database at a lower more direct
level than LOAD FILE?  What are my options to integrate my data parser
with MySQL?

Other than patching mysqld, I can't see how you'd get an additional
interface to the database files.

I have s similar (though certainly not identical) issue with data I'm
dealing with -- about 4 million records twice a week.  Source data is in
files with fixed-length fields with a newline after each record, and
contains some data that must be updated, and some that must be inserted.
(I.e. I receive replacements not diffs.)

My solution was lowbrow, but works well.  I pipe the source file through
a huge honkin' sed script that grabs the fields and converts them into
an INSERT statement, with sed's output piped through the mysql text
client.  (sed 's/^(...)(..)()/ etc etc/')

Each record turns into two statements, first an UPDATE (which fails if
the record doesn't exist), second an INSERT (which fails if the record
does exist).  I obviously run the mysql client with the -f option.  This
method lets me keep the database live while the update runs, whereas
LOAD FILE would require that I flush the data before inserting.

Sure, I'm pushing far more goop through the pipe than the data I'm
updating, but it's just a pipe, and the run happens at 4AM when nobody
is watching.

My db server is a 400MHz P-I and my disks are slow, but it takes me over
an hour to run the file so I'm not sure how this solution would fare a
hundred or a thousand times a day

-- 
  Paul Chvostek <[EMAIL PROTECTED]>
  Operations / Abuse / Whatever
  it.canada, hosting and development   http://www.it.ca/


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bulk loading data

2003-07-09 Thread Paul Chvostek
On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote:
> 
> I want to insert massive amounts of data in near real-time into a MySQL database.  
> The data is in a structured binary format, and I have code that parses the data into 
> logical data structures.
...
> How can I load data directly into the database at a lower more direct level than 
> LOAD FILE?  What are my options to integrate my data parser with MySQL?

Other than patching mysqld, I can't see how you'd get an additional
interface to the database files.

I have s similar (though certainly not identical) issue with data I'm
dealing with -- about 4 million records twice a week.  Source data is in
files with fixed-length fields with a newline after each record, and
contains some data that must be updated, and some that must be inserted.
(I.e. I receive replacements not diffs.)

My solution was lowbrow, but works well.  I pipe the source file through
a huge honkin' sed script that grabs the fields and converts them into
an INSERT statement, with sed's output piped through the mysql text
client.  (sed 's/^(...)(..)()/ etc etc/')

Each record turns into two statements, first an UPDATE (which fails if
the record doesn't exist), second an INSERT (which fails if the record
does exist).  I obviously run the mysql client with the -f option.  This
method lets me keep the database live while the update runs, whereas
LOAD FILE would require that I flush the data before inserting.

Sure, I'm pushing far more goop through the pipe than the data I'm
updating, but it's just a pipe, and the run happens at 4AM when nobody
is watching.

My db server is a 400MHz P-I and my disks are slow, but it takes me over
an hour to run the file so I'm not sure how this solution would fare a
hundred or a thousand times a day

-- 
  Paul Chvostek <[EMAIL PROTECTED]>
  Operations / Abuse / Whatever
  it.canada, hosting and development   http://www.it.ca/


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bulk loading data

2003-07-08 Thread cmars
Paul,
I have used the ODBC interface as a rough test to experiment with application 
compatibility with MySQL.  I would not consider using ODBC at all to load in a 
production environment.

I want to develop a loading method that is as fast as possible.  If I were to use LOAD 
FILE, my application 1. parses the binary file, 2. converts/outputs a text format, 3. 
MySQL parses this text format, and 4. converts to its own internal storage.  It seems 
to me that if I could simply eliminate step 2 and 3 the load time would decrease 
dramatically, especially considering the volume of data.  Disk I/O alone for writing 
and then reading such a large temp file seems to me to be a waste.  I have converted 
these files to text before and they can inflate 10x.. from 10M to 100M!

I know I am taking a more difficult path, but load time must be extremely fast for my 
application.  How could I implement this functionality in my application, or extend 
MySQL to do it?

thanks,
casey

> -Original Message-
> From: Paul DuBois [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 08, 2003, 9:46 PM
> To: [EMAIL PROTECTED], [EMAIL PROTECTED]
> Subject: Re: Bulk loading data
> 
> At 21:36 -0700 7/8/03, cmars wrote:
> >Hi,
> >I want to insert massive amounts of data in near real-time into a 
> >MySQL database.  The data is in a structured binary format, and I 
> >have code that parses the data into logical data structures.
> >
> >Right now I can load into a MySQL database via ODBC, but I need to 
> >improve throughput.  How can I bulk insert the data more efficiently?
> >
> >I would prefer not to use LOAD FILE because it involves converting 
> >the binary format to text, which will have a much larger footprint. 
> >To illustrate the pain of text files in my application, a single 
> >binary file might contain up to a million rows!  And I might load 
> >hundreds, if not thousands of these files in a day!
> >
> >How can I load data directly into the database at a lower more 
> >direct level than LOAD FILE?  What are my options to integrate my 
> >data parser with MySQL?
> 
> I'm curious why you're concerned about the efficiency of LOAD FILE
> when you're using ODBC, an API known to add a fair amount of overhead.
> 
> Have you actually tried LOAD FILE and found it empirically to be slower than
> ODBC?
> 
> 
> 
> >
> >thanks,
> >casey
> 
> 
> -- 
> Paul DuBois, Senior Technical Writer
> Madison, Wisconsin, USA
> MySQL AB, www.mysql.com
> 
> Are you MySQL certified?  http://www.mysql.com/certification/
> 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bulk loading data

2003-07-08 Thread Paul DuBois
At 21:36 -0700 7/8/03, cmars wrote:
Hi,
I want to insert massive amounts of data in near real-time into a 
MySQL database.  The data is in a structured binary format, and I 
have code that parses the data into logical data structures.

Right now I can load into a MySQL database via ODBC, but I need to 
improve throughput.  How can I bulk insert the data more efficiently?

I would prefer not to use LOAD FILE because it involves converting 
the binary format to text, which will have a much larger footprint. 
To illustrate the pain of text files in my application, a single 
binary file might contain up to a million rows!  And I might load 
hundreds, if not thousands of these files in a day!

How can I load data directly into the database at a lower more 
direct level than LOAD FILE?  What are my options to integrate my 
data parser with MySQL?
I'm curious why you're concerned about the efficiency of LOAD FILE
when you're using ODBC, an API known to add a fair amount of overhead.
Have you actually tried LOAD FILE and found it empirically to be slower than
ODBC?


thanks,
casey


--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]