Dan, Dan,
Right now I am using load data to load into my database. I am inserting 10,000 - 20,000 rows (X3) every 10 minutes. One table currently has 17866472 rows Just using the date command from bash, the rough estimates for the inserts via "load data..." are : 2006-08-18 15:39:23 : begin import into table1 2006-08-18 15:41:33 : import into table1 records: (18183) deleted:(0) skipped: (0) warnings:(0) 130 seconds for 18182 rows... as the number or rows in the table increases, this import time increases. Every once in a while I rotate this table out and when doing the same type of 'load data', it takes < 1 second. I have tried disabling the keys, but if I remember correctly, it took just as long if not longer. This table has a primary key based on the 2 fields I described earlier, plus indexes on 4 other fields. All alphanumeric fields are fixed width char fields. So, I am eager to see if this sorting idea helps any. -- George >>>-----Original Message----- >>>From: Dan Buettner [mailto:[EMAIL PROTECTED] >>>Sent: Friday, August 18, 2006 3:53 PM >>>To: George Law >>>Cc: MYSQL General List >>>Subject: Re: sorting datafile for "load data infile" >>> >>>George, for raw speed into a MyISAM table, I think you'll >>>find it hard >>>to beat LOAD DATA INFILE, especially if you disable keys before and >>>re-enable afterwards (which is not unlike what your friend proposes - >>>creating the index in a more efficient fashion). I'd be >>>interested to >>>hear how you get on with perl vs. LOAD DATA INFILE, if you do any >>>comparative benchmarks. >>> >>>Dan >>> >>> >>>On 8/18/06, George Law <[EMAIL PROTECTED]> wrote: >>>> his exact email: >>>> >>>> "As you know, when you are doing mass inserts of >>>millions of >>>> rows of data it can take hours. Well, these guys said >>>that if you sort >>>> the rows of data 1st by the information that will be >>>inserted in to the >>>> database primary key before you do the inserts then the >>>total insert >>>> time will take a fraction of the time. The reason being >>>that then the >>>> database doesn't have to jump back and forth in the TREE >>>structure to >>>> insert each row of data. One row will be inserted >>>immediately after the >>>> previous row that was inserted and so it takes a lot less database >>>> processing time/overhead. >>>> >>>> >>>> >>>> Makes sense! I thought you might be interested in >>>this theory. >>>> They claim it makes a world of difference!" >>>> >>>> >>>> Now I know he references doing straight inserts, not using >>>"load data", >>>> so I am working on rewriting my code to do inserts. >>>> >>>> >>>> I am working on loading everything into hashes in perl, >>>keyed off the >>>> callid field. >>>> then, realistically, I should be able to sort the hash on >>>the index, and >>>> process the inserts with the sorted data. >>>> >>>> >>>> >>>> >>>-----Original Message----- >>>> >>>From: Dan Buettner [mailto:[EMAIL PROTECTED] >>>> >>>Sent: Friday, August 18, 2006 1:40 PM >>>> >>>To: George Law >>>> >>>Cc: MYSQL General List >>>> >>>Subject: Re: sorting datafile for "load data infile" >>>> >>> >>>> >>>George, I've not heard of that technique - "sorting so >>>that mysql >>>> >>>doesn't have to jump around as much". >>>> >>> >>>> >>>What I am aware of, and what can make a difference, is disabling >>>> >>>indexes prior to starting the load. Your load goes quite a bit >>>> >>>faster, in general, but then there is some time at the >>>spent updating >>>> >>>indexes. Overall it can be faster. From >>>> >>>http://dev.mysql.com/doc/refman/5.0/en/load-data.html : >>>> >>> >>>> >>> >>>> >>>If you use LOAD DATA INFILE on an empty MyISAM table, >>>all non-unique >>>> >>>indexes are created in a separate batch (as for REPAIR TABLE). >>>> >>>Normally, this makes LOAD DATA INFILE much faster when >>>you have many >>>> >>>indexes. In some extreme cases, you can create the indexes >>>> >>>even faster >>>> >>>by turning them off with ALTER TABLE ... DISABLE KEYS >>>before loading >>>> >>>the file into the table and using ALTER TABLE ... ENABLE KEYS to >>>> >>>re-create the indexes after loading the file. See >>>Section 7.2.16, >>>> >>>"Speed of INSERT Statements". >>>> >>> >>>> >>> >>>> >>>Dan >>>> >>> >>>> >>> >>>> >>>On 8/18/06, George Law <[EMAIL PROTECTED]> wrote: >>>> >>>> Hello All, >>>> >>>> >>>> >>>> An ex-coworker of my recently sent over a link to a mysql >>>> >>>article about >>>> >>>> speeding up mass imports. >>>> >>>> unfortunately I deleted the email before I had time to >>>> >>>actually digest >>>> >>>> it and now I am wondering if it could help with my imports. >>>> >>>> >>>> >>>> The article suggested sorting the data ahead of time to >>>> >>>prevent mysql >>>> >>>> from having to jump around as much. >>>> >>>> What I have is a raw data file that I pre-process to >>>> >>>create 3 separate >>>> >>>> tab delimited files, which are then loaded into mysql >>>> >>>using "load data >>>> >>>> infile"... >>>> >>>> >>>> >>>> >>>> >>>> I am working with a table with a primary key consisting of >>>> >>>2 fields. >>>> >>>> | cdr_seq_no | int(9) unsigned | >>>> >>>NO | PRI | >>>> >>>> 0 | | >>>> >>>> | callid | char(33) | >>>> >>>NO | PRI | >>>> >>>> | | >>>> >>>> >>>> >>>> >>>> >>>> show index from comp_cdr; >>>> >>>> >>>> >>>> >>>> >>>+----------+------------+------------------+--------------+-- >>>> >>>----------- >>>> >>>> >>>> >>>-----+-----------+-------------+----------+--------+------+-- >>>> >>>----------+ >>>> >>>> ---------+ >>>> >>>> | Table | Non_unique | Key_name | Seq_in_index >>>> >>>| Column_name >>>> >>>> | Collation | Cardinality | Sub_part | Packed | Null >>>| Index_type | >>>> >>>> Comment | >>>> >>>> >>>> >>>+----------+------------+------------------+--------------+-- >>>> >>>----------- >>>> >>>> >>>> >>>-----+-----------+-------------+----------+--------+------+-- >>>> >>>----------+ >>>> >>>> ---------+ >>>> >>>> | comp_cdr | 0 | PRIMARY | >>> 1 | callid >>>> >>>> | A | 3454606 | NULL | NULL | >>>| BTREE | >>>> >>>> | >>>> >>>> | comp_cdr | 0 | PRIMARY | 2 >>>> >>>| cdr_seq_no >>>> >>>> | A | 3454606 | NULL | NULL | >>>| BTREE | >>>> >>>> | >>>> >>>> ... >>>> >>>> >>>> >>>> some sample data is : >>>> >>>> | callid | cdr_seq_no | >>>> >>>> +----------------------------------+------------+ >>>> >>>> | 0000002644e5f21f4abaa204120fad41 | 89630624 | >>>> >>>> | 0000004644e5f3db5af88248020fad41 | 89627398 | >>>> >>>> | 0000008344e5ef975e6eafe0020fad41 | 89630801 | >>>> >>>> | 0000009a44e5f2694aabb6cc12115a45 | 89614941 | >>>> >>>> | 000000e044e5f2e94dd45074020fad41 | 89619029 | >>>> >>>> | 0007bb54c8691110800100201c0060ff | 89616484 | >>>> >>>> | 002a401ae58711d382f98183346766e7 | 89621314 | >>>> >>>> | 007f2ad6df2cdb118dd2c879d6db9673 | 89631061 | >>>> >>>> | 008724bcc7691110800100201c0144ff | 89628873 | >>>> >>>> | 008bbf9dc9691110800100201c00edff | 89627287 | >>>> >>>> >>>> >>>> Any suggestions on exactly how to sort those fields before >>>> >>>> >>>> >>>> >>>> >>>> Thanks!!! >>>> >>>> >>>> >>>> -- >>>> >>>> George Law >>>> >>>> [EMAIL PROTECTED] >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> MySQL General Mailing List >>>> >>>> For list archives: http://lists.mysql.com/mysql >>>> >>>> To unsubscribe: >>>> >>>http://lists.mysql.com/[EMAIL PROTECTED] >>>> >>>> >>>> >>>> >>>> >>> >>>> >>>> >>> -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]