RE: sorting datafile for "load data infile"

George Law Fri, 18 Aug 2006 13:17:03 -0700

Dan,

Dan,


Right now I am using load data to load into my database.

I am inserting 10,000 - 20,000 rows (X3) every 10 minutes. 

One table currently has 17866472 rows

Just using the date command from bash, the rough estimates for the
inserts via "load data..." are :

2006-08-18 15:39:23 : begin import into table1
2006-08-18 15:41:33 : import into table1 records: (18183) deleted:(0)
skipped: (0) warnings:(0)

130 seconds for 18182 rows... as the number or rows in the table
increases, this import time increases.
Every once in a while I rotate this table out and when doing the same
type of 'load data', it takes < 1 second.  

I have tried disabling the keys, but if I remember correctly, it took
just as long if not longer.  
This table has a primary key based on the 2 fields I described earlier,
plus indexes on 4 other fields.  
All alphanumeric fields are fixed width char fields.  

So, I am eager to see if this sorting idea helps any.

--
George

 

>>>-----Original Message-----
>>>From: Dan Buettner [mailto:[EMAIL PROTECTED] 
>>>Sent: Friday, August 18, 2006 3:53 PM
>>>To: George Law
>>>Cc: MYSQL General List
>>>Subject: Re: sorting datafile for "load data infile"
>>>
>>>George, for raw speed into a MyISAM table, I think you'll 
>>>find it hard
>>>to beat LOAD DATA INFILE, especially if you disable keys before and
>>>re-enable afterwards (which is not unlike what your friend proposes -
>>>creating the index in a more efficient fashion).  I'd be 
>>>interested to
>>>hear how you get on with perl vs. LOAD DATA INFILE, if you do any
>>>comparative benchmarks.
>>>
>>>Dan
>>>
>>>
>>>On 8/18/06, George Law <[EMAIL PROTECTED]> wrote:
>>>> his exact email:
>>>>
>>>>         "As you know, when you are doing mass inserts of 
>>>millions of
>>>> rows of data it can take hours.  Well, these guys said 
>>>that if you sort
>>>> the rows of data 1st by the information that will be 
>>>inserted in to the
>>>> database primary key before you do the inserts then the 
>>>total insert
>>>> time will take a fraction of the time.  The reason being 
>>>that then the
>>>> database doesn't have to jump back and forth in the TREE 
>>>structure to
>>>> insert each row of data.  One row will be inserted 
>>>immediately after the
>>>> previous row that was inserted and so it takes a lot less database
>>>> processing time/overhead.
>>>>
>>>>
>>>>
>>>>         Makes sense!  I thought you might be interested in 
>>>this theory.
>>>> They claim it makes a world of difference!"
>>>>
>>>>
>>>> Now I know he references doing straight inserts, not using 
>>>"load data",
>>>> so I am working on rewriting my code to do inserts.
>>>>
>>>>
>>>> I am working on loading everything into hashes in perl, 
>>>keyed off the
>>>> callid field.
>>>> then, realistically, I should be able to sort the hash on 
>>>the index, and
>>>> process the inserts with the sorted data.
>>>>
>>>>
>>>>
>>>> >>>-----Original Message-----
>>>> >>>From: Dan Buettner [mailto:[EMAIL PROTECTED]
>>>> >>>Sent: Friday, August 18, 2006 1:40 PM
>>>> >>>To: George Law
>>>> >>>Cc: MYSQL General List
>>>> >>>Subject: Re: sorting datafile for "load data infile"
>>>> >>>
>>>> >>>George, I've not heard of that technique - "sorting so 
>>>that mysql
>>>> >>>doesn't have to jump around as much".
>>>> >>>
>>>> >>>What I am aware of, and what can make a difference, is disabling
>>>> >>>indexes prior to starting the load.  Your load goes quite a bit
>>>> >>>faster, in general, but then there is some time at the 
>>>spent updating
>>>> >>>indexes.   Overall it can be faster.  From
>>>> >>>http://dev.mysql.com/doc/refman/5.0/en/load-data.html :
>>>> >>>
>>>> >>>
>>>> >>>If you use LOAD DATA INFILE on an empty MyISAM table, 
>>>all non-unique
>>>> >>>indexes are created in a separate batch (as for REPAIR TABLE).
>>>> >>>Normally, this makes LOAD DATA INFILE much faster when 
>>>you have many
>>>> >>>indexes. In some extreme cases, you can create the indexes
>>>> >>>even faster
>>>> >>>by turning them off with ALTER TABLE ... DISABLE KEYS 
>>>before loading
>>>> >>>the file into the table and using ALTER TABLE ... ENABLE KEYS to
>>>> >>>re-create the indexes after loading the file. See 
>>>Section 7.2.16,
>>>> >>>"Speed of INSERT Statements".
>>>> >>>
>>>> >>>
>>>> >>>Dan
>>>> >>>
>>>> >>>
>>>> >>>On 8/18/06, George Law <[EMAIL PROTECTED]> wrote:
>>>> >>>> Hello All,
>>>> >>>>
>>>> >>>> An ex-coworker of my recently sent over a link to a mysql
>>>> >>>article about
>>>> >>>> speeding up mass imports.
>>>> >>>> unfortunately I deleted the email before I had time to
>>>> >>>actually digest
>>>> >>>> it and now I am wondering if it could help with my imports.
>>>> >>>>
>>>> >>>> The article suggested sorting the data ahead of time to
>>>> >>>prevent mysql
>>>> >>>> from having to jump around as much.
>>>> >>>> What I have is a raw data file that I pre-process to
>>>> >>>create 3 separate
>>>> >>>> tab delimited files, which are then loaded into mysql
>>>> >>>using "load data
>>>> >>>> infile"...
>>>> >>>>
>>>> >>>>
>>>> >>>> I am working with a table with a primary key consisting of
>>>> >>>2 fields.
>>>> >>>> | cdr_seq_no                          | int(9) unsigned  |
>>>> >>>NO   | PRI |
>>>> >>>> 0       |       |
>>>> >>>> | callid                              | char(33)         |
>>>> >>>NO   | PRI |
>>>> >>>> |       |
>>>> >>>>
>>>> >>>>
>>>> >>>> show index from comp_cdr;
>>>> >>>>
>>>> >>>>
>>>> >>>+----------+------------+------------------+--------------+--
>>>> >>>-----------
>>>> >>>>
>>>> >>>-----+-----------+-------------+----------+--------+------+--
>>>> >>>----------+
>>>> >>>> ---------+
>>>> >>>> | Table    | Non_unique | Key_name         | Seq_in_index
>>>> >>>| Column_name
>>>> >>>> | Collation | Cardinality | Sub_part | Packed | Null 
>>>| Index_type |
>>>> >>>> Comment |
>>>> >>>>
>>>> >>>+----------+------------+------------------+--------------+--
>>>> >>>-----------
>>>> >>>>
>>>> >>>-----+-----------+-------------+----------+--------+------+--
>>>> >>>----------+
>>>> >>>> ---------+
>>>> >>>> | comp_cdr |          0 | PRIMARY          |          
>>>  1 | callid
>>>> >>>> | A         |     3454606 |     NULL | NULL   |      
>>>| BTREE      |
>>>> >>>> |
>>>> >>>> | comp_cdr |          0 | PRIMARY          |            2
>>>> >>>| cdr_seq_no
>>>> >>>> | A         |     3454606 |     NULL | NULL   |      
>>>| BTREE      |
>>>> >>>> |
>>>> >>>> ...
>>>> >>>>
>>>> >>>> some sample data is :
>>>> >>>> | callid                           | cdr_seq_no |
>>>> >>>> +----------------------------------+------------+
>>>> >>>> | 0000002644e5f21f4abaa204120fad41 |   89630624 |
>>>> >>>> | 0000004644e5f3db5af88248020fad41 |   89627398 |
>>>> >>>> | 0000008344e5ef975e6eafe0020fad41 |   89630801 |
>>>> >>>> | 0000009a44e5f2694aabb6cc12115a45 |   89614941 |
>>>> >>>> | 000000e044e5f2e94dd45074020fad41 |   89619029 |
>>>> >>>> | 0007bb54c8691110800100201c0060ff |   89616484 |
>>>> >>>> | 002a401ae58711d382f98183346766e7 |   89621314 |
>>>> >>>> | 007f2ad6df2cdb118dd2c879d6db9673 |   89631061 |
>>>> >>>> | 008724bcc7691110800100201c0144ff |   89628873 |
>>>> >>>> | 008bbf9dc9691110800100201c00edff |   89627287 |
>>>> >>>>
>>>> >>>> Any suggestions on exactly how to sort those fields before
>>>> >>>>
>>>> >>>>
>>>> >>>> Thanks!!!
>>>> >>>>
>>>> >>>> --
>>>> >>>> George Law
>>>> >>>> [EMAIL PROTECTED]
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> MySQL General Mailing List
>>>> >>>> For list archives: http://lists.mysql.com/mysql
>>>> >>>> To unsubscribe:
>>>> >>>http://lists.mysql.com/[EMAIL PROTECTED]
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>>
>>>>
>>>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: sorting datafile for "load data infile"

Reply via email to