Re: Non-linear degradation in bulk loads?

Heikki Tuuri Tue, 11 Oct 2005 09:03:42 -0700

Jon,

my guess is that the inserts to the UNIQUE secondary index cause theworkload to be seriously disk-bound.

Two solutions: 1) sort the rows to be inserted on the key 'email' beforeinserting.


2) Or:

http://dev.mysql.com/doc/mysql/en/innodb-tuning.html
"

If you have UNIQUE constraints on secondary keys, starting from MySQL3.23.52 and 4.0.3, you can speed up table imports by temporarily turning offthe uniqueness checks during the import session:

SET UNIQUE_CHECKS=0;

For big tables, this saves a lot of disk I/O because InnoDB can use itsinsert buffer to write secondary index records in a batch.

"

But make sure you do not have any duplicates in the rows!

Note that now you can get support on this MySQL mailing list from a VicePresident of Oracle. I hope that the level of support improves.


Best regards,

Heikki
Vice President, server technology
Oracle/Innobase Oy



...................
Everyone,

We're trying to do some bulk data loads on several different tables (on
several different machines, using several different techniques) and
seeing dramatically worse-than-linear performance.

We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax.
We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where
appropriate), and so forth.

The one that is the most immediate concern is a table of the form:

CREATE TABLE `test` (
 `email` varchar(255) NOT NULL default '',
 `when_happened` datetime NOT NULL default '0000-00-00 00:00:00',
 UNIQUE KEY `email` (`email`),
 KEY `when_happened` (`when_happened`)
) TYPE=InnoDB;

I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows
each (~135MB files).  The first chunk was very quick (about 1.5
minutes), but the tenth chunk has taken 22.6 hours and is still going.
(It's been getting progessively slower with each chunk...)

The database is our main sites database but we've dramatically reduced
the load on that machine over the past couple months through careful
optimization of our code.  The box is a dual, dual-core Opteron, 8GB of
RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of
course).  We have 1GB allocated to the buffer pool, and our usual 1GB *
3 log files.  8 I/O threads.

Load on the box sits at around 6-7, with a large (>50%) amount of time
spent in wait state, but actual disk throughput to our software RAID
array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k
blocks/s in.

Something *has* to be wrong here, but we're not sure what we've missed.
We've restored larger data sets from a mysqldump in the past in
dramatically less time on far inferior hardware. (A superset of this
same data to a schema which is also a superset, PLUS a bunch of other
rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual
Xeon w/ 4GB of RAM)

We're inclined to believe that this is a configuration problem, as
opposed to a driver or hardware problem given the non-linear nature of
the performance degradation.  This implies we're doing something truly
stupid with our loads.  What could cause this kind of strangeness?

-JF


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Non-linear degradation in bulk loads?

Reply via email to