Re: Bulk loading data
Can't use REPLACE. The table contains a little more data than what's supplied in the bi-weekly logfile. The REPLACE command will delete the old record before inserting its replacement, which in this case would lose the data in the other fields. p On Wed, Jul 09, 2003 at 03:50:42PM +0200, Rudy Metzger wrote: > > Paul, > > Did you try using REPLACE instead of UPDATE/INSERT? Could give you some > more speed enhancement. > > Cheers > /rudy > > -Original Message- > From: Paul Chvostek [mailto:[EMAIL PROTECTED] > Sent: woensdag 9 juli 2003 15:21 > To: cmars > Cc: [EMAIL PROTECTED] > Subject: Re: Bulk loading data > > On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote: > > > > I want to insert massive amounts of data in near real-time into a > MySQL database. The data is in a structured binary format, and I have > code that parses the data into logical data structures. > ... > > How can I load data directly into the database at a lower more direct > level than LOAD FILE? What are my options to integrate my data parser > with MySQL? > > Other than patching mysqld, I can't see how you'd get an additional > interface to the database files. > > I have s similar (though certainly not identical) issue with data I'm > dealing with -- about 4 million records twice a week. Source data is in > files with fixed-length fields with a newline after each record, and > contains some data that must be updated, and some that must be inserted. > (I.e. I receive replacements not diffs.) > > My solution was lowbrow, but works well. I pipe the source file through > a huge honkin' sed script that grabs the fields and converts them into > an INSERT statement, with sed's output piped through the mysql text > client. (sed 's/^(...)(..)()/ etc etc/') > > Each record turns into two statements, first an UPDATE (which fails if > the record doesn't exist), second an INSERT (which fails if the record > does exist). I obviously run the mysql client with the -f option. This > method lets me keep the database live while the update runs, whereas > LOAD FILE would require that I flush the data before inserting. > > Sure, I'm pushing far more goop through the pipe than the data I'm > updating, but it's just a pipe, and the run happens at 4AM when nobody > is watching. > > My db server is a 400MHz P-I and my disks are slow, but it takes me over > an hour to run the file so I'm not sure how this solution would fare a > hundred or a thousand times a day -- Paul Chvostek <[EMAIL PROTECTED]> Operations / Abuse / Whatever it.canada, hosting and development http://www.it.ca/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bulk loading data
At 22:01 -0700 7/8/03, cmars wrote: Paul, I have used the ODBC interface as a rough test to experiment with application compatibility with MySQL. I would not consider using ODBC at all to load in a production environment. I want to develop a loading method that is as fast as possible. If I were to use LOAD FILE, my application 1. parses the binary file, 2. converts/outputs a text format, 3. MySQL parses this text format, and 4. converts to its own internal storage. It seems to me that if I could simply eliminate step 2 and 3 the load time would decrease dramatically, especially considering the volume of data. Disk I/O alone for writing and then reading such a large temp file seems to me to be a waste. I have converted these files to text before and they can inflate 10x.. from 10M to 100M! Before we go any further, I need to ask something. Are you talking about the LOAD_FILE() function, or are you really talking about the LOAD DATA statement? It sounds like you really mean LOAD DATA. The LOAD_FILE() function tells the server to read a file directly (which means the file must be located on the server host), but it doesn't need to do all of that messing around with converting the file to text. Thus it should be quite efficient. I know I am taking a more difficult path, but load time must be extremely fast for my application. How could I implement this functionality in my application, or extend MySQL to do it? thanks, casey -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 08, 2003, 9:46 PM To: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Bulk loading data At 21:36 -0700 7/8/03, cmars wrote: >Hi, >I want to insert massive amounts of data in near real-time into a >MySQL database. The data is in a structured binary format, and I >have code that parses the data into logical data structures. > >Right now I can load into a MySQL database via ODBC, but I need to >improve throughput. How can I bulk insert the data more efficiently? > >I would prefer not to use LOAD FILE because it involves converting >the binary format to text, which will have a much larger footprint. >To illustrate the pain of text files in my application, a single >binary file might contain up to a million rows! And I might load >hundreds, if not thousands of these files in a day! > >How can I load data directly into the database at a lower more >direct level than LOAD FILE? What are my options to integrate my >data parser with MySQL? I'm curious why you're concerned about the efficiency of LOAD FILE when you're using ODBC, an API known to add a fair amount of overhead. Have you actually tried LOAD FILE and found it empirically to be slower than ODBC? > >thanks, >casey -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Bulk loading data
Paul, Did you try using REPLACE instead of UPDATE/INSERT? Could give you some more speed enhancement. Cheers /rudy -Original Message- From: Paul Chvostek [mailto:[EMAIL PROTECTED] Sent: woensdag 9 juli 2003 15:21 To: cmars Cc: [EMAIL PROTECTED] Subject: Re: Bulk loading data On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote: > > I want to insert massive amounts of data in near real-time into a MySQL database. The data is in a structured binary format, and I have code that parses the data into logical data structures. ... > How can I load data directly into the database at a lower more direct level than LOAD FILE? What are my options to integrate my data parser with MySQL? Other than patching mysqld, I can't see how you'd get an additional interface to the database files. I have s similar (though certainly not identical) issue with data I'm dealing with -- about 4 million records twice a week. Source data is in files with fixed-length fields with a newline after each record, and contains some data that must be updated, and some that must be inserted. (I.e. I receive replacements not diffs.) My solution was lowbrow, but works well. I pipe the source file through a huge honkin' sed script that grabs the fields and converts them into an INSERT statement, with sed's output piped through the mysql text client. (sed 's/^(...)(..)()/ etc etc/') Each record turns into two statements, first an UPDATE (which fails if the record doesn't exist), second an INSERT (which fails if the record does exist). I obviously run the mysql client with the -f option. This method lets me keep the database live while the update runs, whereas LOAD FILE would require that I flush the data before inserting. Sure, I'm pushing far more goop through the pipe than the data I'm updating, but it's just a pipe, and the run happens at 4AM when nobody is watching. My db server is a 400MHz P-I and my disks are slow, but it takes me over an hour to run the file so I'm not sure how this solution would fare a hundred or a thousand times a day -- Paul Chvostek <[EMAIL PROTECTED]> Operations / Abuse / Whatever it.canada, hosting and development http://www.it.ca/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bulk loading data
On Tue, Jul 08, 2003 at 09:36:11PM -0700, cmars wrote: > > I want to insert massive amounts of data in near real-time into a MySQL database. > The data is in a structured binary format, and I have code that parses the data into > logical data structures. ... > How can I load data directly into the database at a lower more direct level than > LOAD FILE? What are my options to integrate my data parser with MySQL? Other than patching mysqld, I can't see how you'd get an additional interface to the database files. I have s similar (though certainly not identical) issue with data I'm dealing with -- about 4 million records twice a week. Source data is in files with fixed-length fields with a newline after each record, and contains some data that must be updated, and some that must be inserted. (I.e. I receive replacements not diffs.) My solution was lowbrow, but works well. I pipe the source file through a huge honkin' sed script that grabs the fields and converts them into an INSERT statement, with sed's output piped through the mysql text client. (sed 's/^(...)(..)()/ etc etc/') Each record turns into two statements, first an UPDATE (which fails if the record doesn't exist), second an INSERT (which fails if the record does exist). I obviously run the mysql client with the -f option. This method lets me keep the database live while the update runs, whereas LOAD FILE would require that I flush the data before inserting. Sure, I'm pushing far more goop through the pipe than the data I'm updating, but it's just a pipe, and the run happens at 4AM when nobody is watching. My db server is a 400MHz P-I and my disks are slow, but it takes me over an hour to run the file so I'm not sure how this solution would fare a hundred or a thousand times a day -- Paul Chvostek <[EMAIL PROTECTED]> Operations / Abuse / Whatever it.canada, hosting and development http://www.it.ca/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bulk loading data
Paul, I have used the ODBC interface as a rough test to experiment with application compatibility with MySQL. I would not consider using ODBC at all to load in a production environment. I want to develop a loading method that is as fast as possible. If I were to use LOAD FILE, my application 1. parses the binary file, 2. converts/outputs a text format, 3. MySQL parses this text format, and 4. converts to its own internal storage. It seems to me that if I could simply eliminate step 2 and 3 the load time would decrease dramatically, especially considering the volume of data. Disk I/O alone for writing and then reading such a large temp file seems to me to be a waste. I have converted these files to text before and they can inflate 10x.. from 10M to 100M! I know I am taking a more difficult path, but load time must be extremely fast for my application. How could I implement this functionality in my application, or extend MySQL to do it? thanks, casey > -Original Message- > From: Paul DuBois [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 08, 2003, 9:46 PM > To: [EMAIL PROTECTED], [EMAIL PROTECTED] > Subject: Re: Bulk loading data > > At 21:36 -0700 7/8/03, cmars wrote: > >Hi, > >I want to insert massive amounts of data in near real-time into a > >MySQL database. The data is in a structured binary format, and I > >have code that parses the data into logical data structures. > > > >Right now I can load into a MySQL database via ODBC, but I need to > >improve throughput. How can I bulk insert the data more efficiently? > > > >I would prefer not to use LOAD FILE because it involves converting > >the binary format to text, which will have a much larger footprint. > >To illustrate the pain of text files in my application, a single > >binary file might contain up to a million rows! And I might load > >hundreds, if not thousands of these files in a day! > > > >How can I load data directly into the database at a lower more > >direct level than LOAD FILE? What are my options to integrate my > >data parser with MySQL? > > I'm curious why you're concerned about the efficiency of LOAD FILE > when you're using ODBC, an API known to add a fair amount of overhead. > > Have you actually tried LOAD FILE and found it empirically to be slower than > ODBC? > > > > > > >thanks, > >casey > > > -- > Paul DuBois, Senior Technical Writer > Madison, Wisconsin, USA > MySQL AB, www.mysql.com > > Are you MySQL certified? http://www.mysql.com/certification/ > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bulk loading data
At 21:36 -0700 7/8/03, cmars wrote: Hi, I want to insert massive amounts of data in near real-time into a MySQL database. The data is in a structured binary format, and I have code that parses the data into logical data structures. Right now I can load into a MySQL database via ODBC, but I need to improve throughput. How can I bulk insert the data more efficiently? I would prefer not to use LOAD FILE because it involves converting the binary format to text, which will have a much larger footprint. To illustrate the pain of text files in my application, a single binary file might contain up to a million rows! And I might load hundreds, if not thousands of these files in a day! How can I load data directly into the database at a lower more direct level than LOAD FILE? What are my options to integrate my data parser with MySQL? I'm curious why you're concerned about the efficiency of LOAD FILE when you're using ODBC, an API known to add a fair amount of overhead. Have you actually tried LOAD FILE and found it empirically to be slower than ODBC? thanks, casey -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]