Re: sorting datafile for "load data infile"

Dan Buettner Fri, 18 Aug 2006 12:53:28 -0700

George, for raw speed into a MyISAM table, I think you'll find it hard
to beat LOAD DATA INFILE, especially if you disable keys before and
re-enable afterwards (which is not unlike what your friend proposes -
creating the index in a more efficient fashion).  I'd be interested to
hear how you get on with perl vs. LOAD DATA INFILE, if you do any
comparative benchmarks.


Dan


On 8/18/06, George Law <[EMAIL PROTECTED]> wrote:

his exact email:

        "As you know, when you are doing mass inserts of millions of
rows of data it can take hours.  Well, these guys said that if you sort
the rows of data 1st by the information that will be inserted in to the
database primary key before you do the inserts then the total insert
time will take a fraction of the time.  The reason being that then the
database doesn't have to jump back and forth in the TREE structure to
insert each row of data.  One row will be inserted immediately after the
previous row that was inserted and so it takes a lot less database
processing time/overhead.



        Makes sense!  I thought you might be interested in this theory.
They claim it makes a world of difference!"


Now I know he references doing straight inserts, not using "load data",
so I am working on rewriting my code to do inserts.


I am working on loading everything into hashes in perl, keyed off the
callid field.
then, realistically, I should be able to sort the hash on the index, and
process the inserts with the sorted data.



>>>-----Original Message-----
>>>From: Dan Buettner [mailto:[EMAIL PROTECTED]
>>>Sent: Friday, August 18, 2006 1:40 PM
>>>To: George Law
>>>Cc: MYSQL General List
>>>Subject: Re: sorting datafile for "load data infile"
>>>
>>>George, I've not heard of that technique - "sorting so that mysql
>>>doesn't have to jump around as much".
>>>
>>>What I am aware of, and what can make a difference, is disabling
>>>indexes prior to starting the load.  Your load goes quite a bit
>>>faster, in general, but then there is some time at the spent updating
>>>indexes.   Overall it can be faster.  From
>>>http://dev.mysql.com/doc/refman/5.0/en/load-data.html :
>>>
>>>
>>>If you use LOAD DATA INFILE on an empty MyISAM table, all non-unique
>>>indexes are created in a separate batch (as for REPAIR TABLE).
>>>Normally, this makes LOAD DATA INFILE much faster when you have many
>>>indexes. In some extreme cases, you can create the indexes
>>>even faster
>>>by turning them off with ALTER TABLE ... DISABLE KEYS before loading
>>>the file into the table and using ALTER TABLE ... ENABLE KEYS to
>>>re-create the indexes after loading the file. See Section 7.2.16,
>>>"Speed of INSERT Statements".
>>>
>>>
>>>Dan
>>>
>>>
>>>On 8/18/06, George Law <[EMAIL PROTECTED]> wrote:
>>>> Hello All,
>>>>
>>>> An ex-coworker of my recently sent over a link to a mysql
>>>article about
>>>> speeding up mass imports.
>>>> unfortunately I deleted the email before I had time to
>>>actually digest
>>>> it and now I am wondering if it could help with my imports.
>>>>
>>>> The article suggested sorting the data ahead of time to
>>>prevent mysql
>>>> from having to jump around as much.
>>>> What I have is a raw data file that I pre-process to
>>>create 3 separate
>>>> tab delimited files, which are then loaded into mysql
>>>using "load data
>>>> infile"...
>>>>
>>>>
>>>> I am working with a table with a primary key consisting of
>>>2 fields.
>>>> | cdr_seq_no                          | int(9) unsigned  |
>>>NO   | PRI |
>>>> 0       |       |
>>>> | callid                              | char(33)         |
>>>NO   | PRI |
>>>> |       |
>>>>
>>>>
>>>> show index from comp_cdr;
>>>>
>>>>
>>>+----------+------------+------------------+--------------+--
>>>-----------
>>>>
>>>-----+-----------+-------------+----------+--------+------+--
>>>----------+
>>>> ---------+
>>>> | Table    | Non_unique | Key_name         | Seq_in_index
>>>| Column_name
>>>> | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
>>>> Comment |
>>>>
>>>+----------+------------+------------------+--------------+--
>>>-----------
>>>>
>>>-----+-----------+-------------+----------+--------+------+--
>>>----------+
>>>> ---------+
>>>> | comp_cdr |          0 | PRIMARY          |            1 | callid
>>>> | A         |     3454606 |     NULL | NULL   |      | BTREE      |
>>>> |
>>>> | comp_cdr |          0 | PRIMARY          |            2
>>>| cdr_seq_no
>>>> | A         |     3454606 |     NULL | NULL   |      | BTREE      |
>>>> |
>>>> ...
>>>>
>>>> some sample data is :
>>>> | callid                           | cdr_seq_no |
>>>> +----------------------------------+------------+
>>>> | 0000002644e5f21f4abaa204120fad41 |   89630624 |
>>>> | 0000004644e5f3db5af88248020fad41 |   89627398 |
>>>> | 0000008344e5ef975e6eafe0020fad41 |   89630801 |
>>>> | 0000009a44e5f2694aabb6cc12115a45 |   89614941 |
>>>> | 000000e044e5f2e94dd45074020fad41 |   89619029 |
>>>> | 0007bb54c8691110800100201c0060ff |   89616484 |
>>>> | 002a401ae58711d382f98183346766e7 |   89621314 |
>>>> | 007f2ad6df2cdb118dd2c879d6db9673 |   89631061 |
>>>> | 008724bcc7691110800100201c0144ff |   89628873 |
>>>> | 008bbf9dc9691110800100201c00edff |   89627287 |
>>>>
>>>> Any suggestions on exactly how to sort those fields before
>>>>
>>>>
>>>> Thanks!!!
>>>>
>>>> --
>>>> George Law
>>>> [EMAIL PROTECTED]
>>>>
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe:
>>>http://lists.mysql.com/[EMAIL PROTECTED]
>>>>
>>>>
>>>


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: sorting datafile for "load data infile"

Reply via email to