Mass insert on InnoDB

2008-01-29 Thread David Schneider-Joseph

Hi all,

I am attempting to convert a very large table (~23 million rows) from  
MyISAM to InnoDB.  If I do it in chunks of one million at a time, the  
first million are very fast (approx. 3 minutes or so), and then it  
gets progressively worse, until by the time I get even to the fourth  
chunk, it's taking 15-20 minutes, and continuing to worsen.  This is  
much worse degradation than the O*log(N) that you would expect.


I have determined that this is in part due to the indexes on the  
table.  I have an index on two columns, call them A and B.  There is  
no relationship between the ordering in A and the ordering in B.  If I  
create the InnoDB table with only index A, and insert the data into it  
in the order of A, then almost no gradual degradation in performance  
can be observed between chunks.  Similarly, if I create the InnoDB  
table with only index B, and insert the data ordered by B, it also  
maintains its performance.


However, I have not been able find a way to insert the data with  
*both* indexes, A and B, without suffering this gradual degradation in  
performance.  I have tried all sorts of methods to convert the table,  
such as dumping to a file and importing back into an InnoDB table,  
using ALTER TABLE big_table ENGINE=InnoDB;, importing the data  
without the second index and adding it after the fact, and in all  
cases it is just a game of whackamole, with the overhead being moved  
elsewhere.  (Note: my primary key is an integer column, so that is not  
the issue here.)


This problem can even be reproduced in a very simple test case, where  
I continuously insert approximately 1 million rows into a table, with  
random data.  `big_table` can be any table with approximately one  
million rows in id range 1 through 100 (we're not actually using  
any data from it):


mysql create table test (id int NOT NULL auto_increment, x int NOT  
NULL, primary key (id), key (x)) ENGINE=InnoDB;

Query OK, 0 rows affected (0.08 sec)

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (22.23 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (37.03 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (56.41 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (1 min 8.47 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (1 min 27.67 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned)  
from big_table where id between 1 and 100;

Query OK, 981734 rows affected (1 min 57.93 sec)
Records: 981734  Duplicates: 0  Warnings: 0


Any ideas, anyone?

Thanks,
David

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mass insert on InnoDB

2008-01-29 Thread Baron Schwartz
Hi,

On Jan 29, 2008 4:08 PM, David Schneider-Joseph [EMAIL PROTECTED] wrote:
 Hi all,

 I am attempting to convert a very large table (~23 million rows) from
 MyISAM to InnoDB.  If I do it in chunks of one million at a time, the
 first million are very fast (approx. 3 minutes or so), and then it
 gets progressively worse, until by the time I get even to the fourth
 chunk, it's taking 15-20 minutes, and continuing to worsen.  This is
 much worse degradation than the O*log(N) that you would expect.

 I have determined that this is in part due to the indexes on the
 table.  I have an index on two columns, call them A and B.  There is
 no relationship between the ordering in A and the ordering in B.  If I
 create the InnoDB table with only index A, and insert the data into it
 in the order of A, then almost no gradual degradation in performance
 can be observed between chunks.  Similarly, if I create the InnoDB
 table with only index B, and insert the data ordered by B, it also
 maintains its performance.

 However, I have not been able find a way to insert the data with
 *both* indexes, A and B, without suffering this gradual degradation in
 performance.  I have tried all sorts of methods to convert the table,
 such as dumping to a file and importing back into an InnoDB table,
 using ALTER TABLE big_table ENGINE=InnoDB;, importing the data
 without the second index and adding it after the fact, and in all
 cases it is just a game of whackamole, with the overhead being moved
 elsewhere.  (Note: my primary key is an integer column, so that is not
 the issue here.)

 This problem can even be reproduced in a very simple test case, where
 I continuously insert approximately 1 million rows into a table, with
 random data.  `big_table` can be any table with approximately one
 million rows in id range 1 through 100 (we're not actually using
 any data from it):

  mysql create table test (id int NOT NULL auto_increment, x int NOT
  NULL, primary key (id), key (x)) ENGINE=InnoDB;
  Query OK, 0 rows affected (0.08 sec)
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (22.23 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (37.03 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (56.41 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 8.47 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 27.67 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 57.93 sec)
  Records: 981734  Duplicates: 0  Warnings: 0

 Any ideas, anyone?

It's because your index is bigger than your memory (or at least bigger
than your InnoDB buffer pool).  InnoDB can't build indexes by sorting
rows, so building the indexes gets slow.

Perhaps you can keep the table as many small tables, or keep older
data in an archive table that's MyISAM and keep only the newest rows
in InnoDB.  Or if you're using MySQL 5.1, this might be a candidate
for partitioning.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mass insert on InnoDB

2008-01-29 Thread David Schneider-Joseph

On Jan 29, 2008, at 4:37 PM, Baron Schwartz wrote:


It's because your index is bigger than your memory (or at least bigger
than your InnoDB buffer pool).  InnoDB can't build indexes by sorting
rows, so building the indexes gets slow.


Hmm, this would be an interesting theory for the main table in  
question, but it doesn't seem to explain why the test case has the  
same problem.  A show table status tells me that at no point does  
the combined size of the data and indexes exceed 306 MB, though I've  
got 512 MB allocated to the buffer pool (and top tells me that no swap  
space is being used):


mysql create table test2 (id int NOT NULL auto_increment, x int NOT  
NULL, primary key (id), key (x)) Engine=InnoDB;

Query OK, 0 rows affected (0.10 sec)

mysql insert into test2 (x) select cast(rand()*1000 as  
unsigned) from big_table where id between 1 and 100;

Query OK, 981734 rows affected (25.43 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql show table status like 'test2';
+---++-+++ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+
| Name  | Engine | Version | Row_format | Rows   | Avg_row_length |  
Data_length | Max_data_length | Index_length | Data_free |  
Auto_increment | Create_time | Update_time | Check_time |  
Collation | Checksum | Create_options | Comment  |
+---++-+++ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+
| test2 | InnoDB |  10 | Compact| 982123 | 29  
|28884992 |   0 | 19447808 | 0 |  
981735 | 2008-01-29 17:06:23 | NULL| NULL   |  
latin1_swedish_ci | NULL || InnoDB free: 6144 kB |
+---++-+++ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+

1 row in set (0.08 sec)

mysql insert into test2 (x) select cast(rand()*1000 as  
unsigned) from big_table where id between 1 and 100;

Query OK, 981734 rows affected (36.97 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql show table status like 'test2';
+---++-++-+ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+
| Name  | Engine | Version | Row_format | Rows| Avg_row_length |  
Data_length | Max_data_length | Index_length | Data_free |  
Auto_increment | Create_time | Update_time | Check_time |  
Collation | Checksum | Create_options | Comment  |
+---++-++-+ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+
| test2 | InnoDB |  10 | Compact| 1964237 | 28  
|56180736 |   0 | 39403520 | 0 | 
1963469 | 2008-01-29 17:06:23 | NULL| NULL   |  
latin1_swedish_ci | NULL || InnoDB free: 4096 kB |
+---++-++-+ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+

1 row in set (0.08 sec)

mysql insert into test2 (x) select cast(rand()*1000 as  
unsigned) from big_table where id between 1 and 100;

Query OK, 981734 rows affected (58.99 sec)
Records: 981734  Duplicates: 0  Warnings: 0

mysql show table status like 'test2';
+---++-++-+ 
+-+-+--+--- 
++-+-+ 
+---+--+ 
+--+
| Name  | Engine | Version | Row_format | Rows| Avg_row_length |  
Data_length | Max_data_length | Index_length | Data_free |  
Auto_increment | Create_time | Update_time | Check_time |  
Collation | Checksum | Create_options | Comment  |
+---++-++-+ 
+-+-+--+--- 

Re: Mass insert on InnoDB

2008-01-29 Thread Jan Kirchhoff

David Schneider-Joseph schrieb:

Hi all,

I am attempting to convert a very large table (~23 million rows) from 
MyISAM to InnoDB.  If I do it in chunks of one million at a time, the 
first million are very fast (approx. 3 minutes or so), and then it 
gets progressively worse, until by the time I get even to the fourth 
chunk, it's taking 15-20 minutes, and continuing to worsen.  This is 
much worse degradation than the O*log(N) that you would expect.



[...]


This problem can even be reproduced in a very simple test case, where 
I continuously insert approximately 1 million rows into a table, with 
random data.  `big_table` can be any table with approximately one 
million rows in id range 1 through 100 (we're not actually using 
any data from it):



[...]

Any ideas, anyone?

Thanks,
David




what hardware are you running on and you much memory do you have? what 
version of mysql?| |

How did you set innodb_buffer_pool_size? you might want to read
http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html
and do some tuning.

In case that doesn't help you, you'll need to post more info on your config.

Jan


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mass insert on InnoDB

2008-01-29 Thread David Schneider-Joseph

On Jan 29, 2008, at 6:09 PM, Jan Kirchhoff wrote:

what hardware are you running on and you much memory do you have?  
what version of mysql?| |

How did you set innodb_buffer_pool_size?


Hardware:
Dual AMD Opteron 246 2.0 GHz
4 GB DDR RAM (no swap being used)
Dual 146 GB SCSI drives with a RAID 1

Software:
RedHat Linux, kernel version 2.6.9-55.ELsmp
MySQL 5.0.45-community-log

InnoDB configuration:

+-++
| Variable_name   | Value  |
+-++
| innodb_additional_mem_pool_size | 20971520   |
| innodb_autoextend_increment | 8  |
| innodb_buffer_pool_awe_mem_mb   | 0  |
| innodb_buffer_pool_size | 536870912  |
| innodb_checksums| ON |
| innodb_commit_concurrency   | 0  |
| innodb_concurrency_tickets  | 500|
| innodb_data_file_path   | ibdata1:10M:autoextend |
| innodb_data_home_dir||
| innodb_doublewrite  | ON |
| innodb_fast_shutdown| 1  |
| innodb_file_io_threads  | 4  |
| innodb_file_per_table   | ON |
| innodb_flush_log_at_trx_commit  | 1  |
| innodb_flush_method ||
| innodb_force_recovery   | 0  |
| innodb_lock_wait_timeout| 50 |
| innodb_locks_unsafe_for_binlog  | OFF|
| innodb_log_arch_dir ||
| innodb_log_archive  | OFF|
| innodb_log_buffer_size  | 1048576|
| innodb_log_file_size| 5242880|
| innodb_log_files_in_group   | 2  |
| innodb_log_group_home_dir   | ./ |
| innodb_max_dirty_pages_pct  | 90 |
| innodb_max_purge_lag| 0  |
| innodb_mirrored_log_groups  | 1  |
| innodb_open_files   | 300|
| innodb_rollback_on_timeout  | OFF|
| innodb_support_xa   | ON |
| innodb_sync_spin_loops  | 20 |
| innodb_table_locks  | ON |
| innodb_thread_concurrency   | 8  |
| innodb_thread_sleep_delay   | 1  |
+-++




you might want to read
http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html
and do some tuning.


As best I can tell, our server is tuned appropriately.  We've  
definitely spent effort on tuning it already.


In case that doesn't help you, you'll need to post more info on your  
config.


Done. :)  Thanks for your assistance.


Jan


David

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mass insert on InnoDB

2008-01-29 Thread Jan Kirchhoff

David Schneider-Joseph schrieb:

On Jan 29, 2008, at 6:09 PM, Jan Kirchhoff wrote:

what hardware are you running on and you much memory do you have? 
what version of mysql?| |

How did you set innodb_buffer_pool_size?


Hardware:
Dual AMD Opteron 246 2.0 GHz
4 GB DDR RAM (no swap being used)
Dual 146 GB SCSI drives with a RAID 1

Software:
RedHat Linux, kernel version 2.6.9-55.ELsmp
MySQL 5.0.45-community-log


[...]
As best I can tell, our server is tuned appropriately.  We've 
definitely spent effort on tuning it already.


This is on my Replication-Slave I use for backups:

mysql create table test (id int NOT NULL auto_increment, x int NOT 
NULL, primary key (id), key (x)) ENGINE=InnoDB;

Query OK, 0 rows affected (0.05 sec)

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (20.20 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (17.60 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (15.67 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (14.91 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (17.89 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql insert into test (x) select cast(rand()*1000 as unsigned) 
from verybigtable limit 100;

Query OK, 100 rows affected (16.24 sec)
Records: 100  Duplicates: 0  Warnings: 0

your innodb_log_buffer_size and innodb_log_file_size look tiny, this is 
my config on the system:
(dual-socket dual-core opteron 2216 with Areca sata-raid10 (w/ bbu) and 
16gb RAM)

+-+---+
| Variable_name   | Value |
+-+---+
| innodb_additional_mem_pool_size | 16777216  |
| innodb_autoextend_increment | 1000  |
| innodb_buffer_pool_awe_mem_mb   | 0 |
| innodb_buffer_pool_size | 1048576   |
| innodb_checksums| ON|
| innodb_commit_concurrency   | 0 |
| innodb_concurrency_tickets  | 500   |
| innodb_data_file_path   | /var/lib/mysql/ibdata:100M:autoextend |
| innodb_data_home_dir|   |
| innodb_doublewrite  | ON|
| innodb_fast_shutdown| 1 |
| innodb_file_io_threads  | 4 |
| innodb_file_per_table   | ON|
| innodb_flush_log_at_trx_commit  | 0 |
| innodb_flush_method |   |
| innodb_force_recovery   | 0 |
| innodb_lock_wait_timeout| 120   |
| innodb_locks_unsafe_for_binlog  | OFF   |
| innodb_log_arch_dir | /var/lib/mysql2/innodb/   |
| innodb_log_archive  | OFF   |
| innodb_log_buffer_size  | 33554432  |
| innodb_log_file_size| 536870912 |
| innodb_log_files_in_group   | 3 |
| innodb_log_group_home_dir   | /var/lib/mysql2/innodb/   |
| innodb_max_dirty_pages_pct  | 90|
| innodb_max_purge_lag| 0 |
| innodb_mirrored_log_groups  | 1 |
| innodb_open_files   | 300   |
| innodb_rollback_on_timeout  | OFF   |
| innodb_support_xa   | OFF   |
| innodb_sync_spin_loops  | 20|
| innodb_table_locks  | ON|
| innodb_thread_concurrency   | 16|
| innodb_thread_sleep_delay   | 1 |
+-+---+

play around with 

Re: Mass insert on InnoDB

2008-01-29 Thread BJ Swope
drop the indexes for the conversion then rebuild the indexes after the
tables are converted.

On Jan 29, 2008 4:08 PM, David Schneider-Joseph [EMAIL PROTECTED] wrote:

 Hi all,

 I am attempting to convert a very large table (~23 million rows) from
 MyISAM to InnoDB.  If I do it in chunks of one million at a time, the
 first million are very fast (approx. 3 minutes or so), and then it
 gets progressively worse, until by the time I get even to the fourth
 chunk, it's taking 15-20 minutes, and continuing to worsen.  This is
 much worse degradation than the O*log(N) that you would expect.

 I have determined that this is in part due to the indexes on the
 table.  I have an index on two columns, call them A and B.  There is
 no relationship between the ordering in A and the ordering in B.  If I
 create the InnoDB table with only index A, and insert the data into it
 in the order of A, then almost no gradual degradation in performance
 can be observed between chunks.  Similarly, if I create the InnoDB
 table with only index B, and insert the data ordered by B, it also
 maintains its performance.

 However, I have not been able find a way to insert the data with
 *both* indexes, A and B, without suffering this gradual degradation in
 performance.  I have tried all sorts of methods to convert the table,
 such as dumping to a file and importing back into an InnoDB table,
 using ALTER TABLE big_table ENGINE=InnoDB;, importing the data
 without the second index and adding it after the fact, and in all
 cases it is just a game of whackamole, with the overhead being moved
 elsewhere.  (Note: my primary key is an integer column, so that is not
 the issue here.)

 This problem can even be reproduced in a very simple test case, where
 I continuously insert approximately 1 million rows into a table, with
 random data.  `big_table` can be any table with approximately one
 million rows in id range 1 through 100 (we're not actually using
 any data from it):

  mysql create table test (id int NOT NULL auto_increment, x int NOT
  NULL, primary key (id), key (x)) ENGINE=InnoDB;
  Query OK, 0 rows affected (0.08 sec)
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (22.23 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (37.03 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (56.41 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 8.47 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 27.67 sec)
  Records: 981734  Duplicates: 0  Warnings: 0
 
  mysql insert into test (x) select cast(rand()*1000 as unsigned)
  from big_table where id between 1 and 100;
  Query OK, 981734 rows affected (1 min 57.93 sec)
  Records: 981734  Duplicates: 0  Warnings: 0

 Any ideas, anyone?

 Thanks,
 David

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
We are all slave to our own paradigm. -- Joshua Williams

If the letters PhD appear after a person's name, that person will remain
outdoors even after it's started raining. -- Jeff Kay


Re: Mass insert on InnoDB

2008-01-29 Thread David Schneider-Joseph

On Jan 29, 2008, at 7:04 PM, Jan Kirchhoff wrote:

play around with innodb_log_buffer_size, innodb_log_file_size  and  
try to set innodb_flush_log_at_trx_commit=0.

Do you don't have a BBU on your raid-controller?

let me know if that changes anything.


That did it!  I upped the log_buffer_size to 8M and the log_file_size  
to 128M, and the problem went away entirely!


Thank you very very much, Jan. :)

And thanks to everyone else as well who offered their advice.

David


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mass insert on InnoDB

2008-01-29 Thread David Schneider-Joseph

On Jan 29, 2008, at 6:21 PM, BJ Swope wrote:


drop the indexes for the conversion then rebuild the indexes after the
tables are converted.


As noted in my original email, I tried that, but Jan's suggestion re:  
InnoDB tuning fixed it.


Thanks for the advice, everyone!

David

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]