Mass insert on InnoDB
Hi all, I am attempting to convert a very large table (~23 million rows) from MyISAM to InnoDB. If I do it in chunks of one million at a time, the first million are very fast (approx. 3 minutes or so), and then it gets progressively worse, until by the time I get even to the fourth chunk, it's taking 15-20 minutes, and continuing to worsen. This is much worse degradation than the O*log(N) that you would expect. I have determined that this is in part due to the indexes on the table. I have an index on two columns, call them A and B. There is no relationship between the ordering in A and the ordering in B. If I create the InnoDB table with only index A, and insert the data into it in the order of A, then almost no gradual degradation in performance can be observed between chunks. Similarly, if I create the InnoDB table with only index B, and insert the data ordered by B, it also maintains its performance. However, I have not been able find a way to insert the data with *both* indexes, A and B, without suffering this gradual degradation in performance. I have tried all sorts of methods to convert the table, such as dumping to a file and importing back into an InnoDB table, using ALTER TABLE big_table ENGINE=InnoDB;, importing the data without the second index and adding it after the fact, and in all cases it is just a game of whackamole, with the overhead being moved elsewhere. (Note: my primary key is an integer column, so that is not the issue here.) This problem can even be reproduced in a very simple test case, where I continuously insert approximately 1 million rows into a table, with random data. `big_table` can be any table with approximately one million rows in id range 1 through 100 (we're not actually using any data from it): mysql create table test (id int NOT NULL auto_increment, x int NOT NULL, primary key (id), key (x)) ENGINE=InnoDB; Query OK, 0 rows affected (0.08 sec) mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (22.23 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (37.03 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (56.41 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 8.47 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 27.67 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 57.93 sec) Records: 981734 Duplicates: 0 Warnings: 0 Any ideas, anyone? Thanks, David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mass insert on InnoDB
Hi, On Jan 29, 2008 4:08 PM, David Schneider-Joseph [EMAIL PROTECTED] wrote: Hi all, I am attempting to convert a very large table (~23 million rows) from MyISAM to InnoDB. If I do it in chunks of one million at a time, the first million are very fast (approx. 3 minutes or so), and then it gets progressively worse, until by the time I get even to the fourth chunk, it's taking 15-20 minutes, and continuing to worsen. This is much worse degradation than the O*log(N) that you would expect. I have determined that this is in part due to the indexes on the table. I have an index on two columns, call them A and B. There is no relationship between the ordering in A and the ordering in B. If I create the InnoDB table with only index A, and insert the data into it in the order of A, then almost no gradual degradation in performance can be observed between chunks. Similarly, if I create the InnoDB table with only index B, and insert the data ordered by B, it also maintains its performance. However, I have not been able find a way to insert the data with *both* indexes, A and B, without suffering this gradual degradation in performance. I have tried all sorts of methods to convert the table, such as dumping to a file and importing back into an InnoDB table, using ALTER TABLE big_table ENGINE=InnoDB;, importing the data without the second index and adding it after the fact, and in all cases it is just a game of whackamole, with the overhead being moved elsewhere. (Note: my primary key is an integer column, so that is not the issue here.) This problem can even be reproduced in a very simple test case, where I continuously insert approximately 1 million rows into a table, with random data. `big_table` can be any table with approximately one million rows in id range 1 through 100 (we're not actually using any data from it): mysql create table test (id int NOT NULL auto_increment, x int NOT NULL, primary key (id), key (x)) ENGINE=InnoDB; Query OK, 0 rows affected (0.08 sec) mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (22.23 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (37.03 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (56.41 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 8.47 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 27.67 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 57.93 sec) Records: 981734 Duplicates: 0 Warnings: 0 Any ideas, anyone? It's because your index is bigger than your memory (or at least bigger than your InnoDB buffer pool). InnoDB can't build indexes by sorting rows, so building the indexes gets slow. Perhaps you can keep the table as many small tables, or keep older data in an archive table that's MyISAM and keep only the newest rows in InnoDB. Or if you're using MySQL 5.1, this might be a candidate for partitioning. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mass insert on InnoDB
On Jan 29, 2008, at 4:37 PM, Baron Schwartz wrote: It's because your index is bigger than your memory (or at least bigger than your InnoDB buffer pool). InnoDB can't build indexes by sorting rows, so building the indexes gets slow. Hmm, this would be an interesting theory for the main table in question, but it doesn't seem to explain why the test case has the same problem. A show table status tells me that at no point does the combined size of the data and indexes exceed 306 MB, though I've got 512 MB allocated to the buffer pool (and top tells me that no swap space is being used): mysql create table test2 (id int NOT NULL auto_increment, x int NOT NULL, primary key (id), key (x)) Engine=InnoDB; Query OK, 0 rows affected (0.10 sec) mysql insert into test2 (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (25.43 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql show table status like 'test2'; +---++-+++ +-+-+--+--- ++-+-+ +---+--+ +--+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +---++-+++ +-+-+--+--- ++-+-+ +---+--+ +--+ | test2 | InnoDB | 10 | Compact| 982123 | 29 |28884992 | 0 | 19447808 | 0 | 981735 | 2008-01-29 17:06:23 | NULL| NULL | latin1_swedish_ci | NULL || InnoDB free: 6144 kB | +---++-+++ +-+-+--+--- ++-+-+ +---+--+ +--+ 1 row in set (0.08 sec) mysql insert into test2 (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (36.97 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql show table status like 'test2'; +---++-++-+ +-+-+--+--- ++-+-+ +---+--+ +--+ | Name | Engine | Version | Row_format | Rows| Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +---++-++-+ +-+-+--+--- ++-+-+ +---+--+ +--+ | test2 | InnoDB | 10 | Compact| 1964237 | 28 |56180736 | 0 | 39403520 | 0 | 1963469 | 2008-01-29 17:06:23 | NULL| NULL | latin1_swedish_ci | NULL || InnoDB free: 4096 kB | +---++-++-+ +-+-+--+--- ++-+-+ +---+--+ +--+ 1 row in set (0.08 sec) mysql insert into test2 (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (58.99 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql show table status like 'test2'; +---++-++-+ +-+-+--+--- ++-+-+ +---+--+ +--+ | Name | Engine | Version | Row_format | Rows| Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +---++-++-+ +-+-+--+---
Re: Mass insert on InnoDB
David Schneider-Joseph schrieb: Hi all, I am attempting to convert a very large table (~23 million rows) from MyISAM to InnoDB. If I do it in chunks of one million at a time, the first million are very fast (approx. 3 minutes or so), and then it gets progressively worse, until by the time I get even to the fourth chunk, it's taking 15-20 minutes, and continuing to worsen. This is much worse degradation than the O*log(N) that you would expect. [...] This problem can even be reproduced in a very simple test case, where I continuously insert approximately 1 million rows into a table, with random data. `big_table` can be any table with approximately one million rows in id range 1 through 100 (we're not actually using any data from it): [...] Any ideas, anyone? Thanks, David what hardware are you running on and you much memory do you have? what version of mysql?| | How did you set innodb_buffer_pool_size? you might want to read http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html and do some tuning. In case that doesn't help you, you'll need to post more info on your config. Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mass insert on InnoDB
On Jan 29, 2008, at 6:09 PM, Jan Kirchhoff wrote: what hardware are you running on and you much memory do you have? what version of mysql?| | How did you set innodb_buffer_pool_size? Hardware: Dual AMD Opteron 246 2.0 GHz 4 GB DDR RAM (no swap being used) Dual 146 GB SCSI drives with a RAID 1 Software: RedHat Linux, kernel version 2.6.9-55.ELsmp MySQL 5.0.45-community-log InnoDB configuration: +-++ | Variable_name | Value | +-++ | innodb_additional_mem_pool_size | 20971520 | | innodb_autoextend_increment | 8 | | innodb_buffer_pool_awe_mem_mb | 0 | | innodb_buffer_pool_size | 536870912 | | innodb_checksums| ON | | innodb_commit_concurrency | 0 | | innodb_concurrency_tickets | 500| | innodb_data_file_path | ibdata1:10M:autoextend | | innodb_data_home_dir|| | innodb_doublewrite | ON | | innodb_fast_shutdown| 1 | | innodb_file_io_threads | 4 | | innodb_file_per_table | ON | | innodb_flush_log_at_trx_commit | 1 | | innodb_flush_method || | innodb_force_recovery | 0 | | innodb_lock_wait_timeout| 50 | | innodb_locks_unsafe_for_binlog | OFF| | innodb_log_arch_dir || | innodb_log_archive | OFF| | innodb_log_buffer_size | 1048576| | innodb_log_file_size| 5242880| | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | ./ | | innodb_max_dirty_pages_pct | 90 | | innodb_max_purge_lag| 0 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300| | innodb_rollback_on_timeout | OFF| | innodb_support_xa | ON | | innodb_sync_spin_loops | 20 | | innodb_table_locks | ON | | innodb_thread_concurrency | 8 | | innodb_thread_sleep_delay | 1 | +-++ you might want to read http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html and do some tuning. As best I can tell, our server is tuned appropriately. We've definitely spent effort on tuning it already. In case that doesn't help you, you'll need to post more info on your config. Done. :) Thanks for your assistance. Jan David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mass insert on InnoDB
David Schneider-Joseph schrieb: On Jan 29, 2008, at 6:09 PM, Jan Kirchhoff wrote: what hardware are you running on and you much memory do you have? what version of mysql?| | How did you set innodb_buffer_pool_size? Hardware: Dual AMD Opteron 246 2.0 GHz 4 GB DDR RAM (no swap being used) Dual 146 GB SCSI drives with a RAID 1 Software: RedHat Linux, kernel version 2.6.9-55.ELsmp MySQL 5.0.45-community-log [...] As best I can tell, our server is tuned appropriately. We've definitely spent effort on tuning it already. This is on my Replication-Slave I use for backups: mysql create table test (id int NOT NULL auto_increment, x int NOT NULL, primary key (id), key (x)) ENGINE=InnoDB; Query OK, 0 rows affected (0.05 sec) mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (20.20 sec) Records: 100 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (17.60 sec) Records: 100 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (15.67 sec) Records: 100 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (14.91 sec) Records: 100 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (17.89 sec) Records: 100 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from verybigtable limit 100; Query OK, 100 rows affected (16.24 sec) Records: 100 Duplicates: 0 Warnings: 0 your innodb_log_buffer_size and innodb_log_file_size look tiny, this is my config on the system: (dual-socket dual-core opteron 2216 with Areca sata-raid10 (w/ bbu) and 16gb RAM) +-+---+ | Variable_name | Value | +-+---+ | innodb_additional_mem_pool_size | 16777216 | | innodb_autoextend_increment | 1000 | | innodb_buffer_pool_awe_mem_mb | 0 | | innodb_buffer_pool_size | 1048576 | | innodb_checksums| ON| | innodb_commit_concurrency | 0 | | innodb_concurrency_tickets | 500 | | innodb_data_file_path | /var/lib/mysql/ibdata:100M:autoextend | | innodb_data_home_dir| | | innodb_doublewrite | ON| | innodb_fast_shutdown| 1 | | innodb_file_io_threads | 4 | | innodb_file_per_table | ON| | innodb_flush_log_at_trx_commit | 0 | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_lock_wait_timeout| 120 | | innodb_locks_unsafe_for_binlog | OFF | | innodb_log_arch_dir | /var/lib/mysql2/innodb/ | | innodb_log_archive | OFF | | innodb_log_buffer_size | 33554432 | | innodb_log_file_size| 536870912 | | innodb_log_files_in_group | 3 | | innodb_log_group_home_dir | /var/lib/mysql2/innodb/ | | innodb_max_dirty_pages_pct | 90| | innodb_max_purge_lag| 0 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300 | | innodb_rollback_on_timeout | OFF | | innodb_support_xa | OFF | | innodb_sync_spin_loops | 20| | innodb_table_locks | ON| | innodb_thread_concurrency | 16| | innodb_thread_sleep_delay | 1 | +-+---+ play around with
Re: Mass insert on InnoDB
drop the indexes for the conversion then rebuild the indexes after the tables are converted. On Jan 29, 2008 4:08 PM, David Schneider-Joseph [EMAIL PROTECTED] wrote: Hi all, I am attempting to convert a very large table (~23 million rows) from MyISAM to InnoDB. If I do it in chunks of one million at a time, the first million are very fast (approx. 3 minutes or so), and then it gets progressively worse, until by the time I get even to the fourth chunk, it's taking 15-20 minutes, and continuing to worsen. This is much worse degradation than the O*log(N) that you would expect. I have determined that this is in part due to the indexes on the table. I have an index on two columns, call them A and B. There is no relationship between the ordering in A and the ordering in B. If I create the InnoDB table with only index A, and insert the data into it in the order of A, then almost no gradual degradation in performance can be observed between chunks. Similarly, if I create the InnoDB table with only index B, and insert the data ordered by B, it also maintains its performance. However, I have not been able find a way to insert the data with *both* indexes, A and B, without suffering this gradual degradation in performance. I have tried all sorts of methods to convert the table, such as dumping to a file and importing back into an InnoDB table, using ALTER TABLE big_table ENGINE=InnoDB;, importing the data without the second index and adding it after the fact, and in all cases it is just a game of whackamole, with the overhead being moved elsewhere. (Note: my primary key is an integer column, so that is not the issue here.) This problem can even be reproduced in a very simple test case, where I continuously insert approximately 1 million rows into a table, with random data. `big_table` can be any table with approximately one million rows in id range 1 through 100 (we're not actually using any data from it): mysql create table test (id int NOT NULL auto_increment, x int NOT NULL, primary key (id), key (x)) ENGINE=InnoDB; Query OK, 0 rows affected (0.08 sec) mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (22.23 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (37.03 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (56.41 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 8.47 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 27.67 sec) Records: 981734 Duplicates: 0 Warnings: 0 mysql insert into test (x) select cast(rand()*1000 as unsigned) from big_table where id between 1 and 100; Query OK, 981734 rows affected (1 min 57.93 sec) Records: 981734 Duplicates: 0 Warnings: 0 Any ideas, anyone? Thanks, David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- We are all slave to our own paradigm. -- Joshua Williams If the letters PhD appear after a person's name, that person will remain outdoors even after it's started raining. -- Jeff Kay
Re: Mass insert on InnoDB
On Jan 29, 2008, at 7:04 PM, Jan Kirchhoff wrote: play around with innodb_log_buffer_size, innodb_log_file_size and try to set innodb_flush_log_at_trx_commit=0. Do you don't have a BBU on your raid-controller? let me know if that changes anything. That did it! I upped the log_buffer_size to 8M and the log_file_size to 128M, and the problem went away entirely! Thank you very very much, Jan. :) And thanks to everyone else as well who offered their advice. David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mass insert on InnoDB
On Jan 29, 2008, at 6:21 PM, BJ Swope wrote: drop the indexes for the conversion then rebuild the indexes after the tables are converted. As noted in my original email, I tried that, but Jan's suggestion re: InnoDB tuning fixed it. Thanks for the advice, everyone! David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]