Re: Non-linear degradation in bulk loads?
Jon, using a 4 GB buffer pool is safe in a 64-bit computer, and allocating 50 % - 80 % of memory to the buffer pool is recommended in an InnoDB-only server. Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; mysql@lists.mysql.com mailto:mysql@lists.mysql.com Lähetetty: Tuesday, October 18, 2005 4:39 AM Aihe: RE: Non-linear degradation in bulk loads? Side question: If I use a 64-bit MySQL build on a 64-bit kernel, is it safe and sane to allocate say, 6GB to the InnoDB buffer pool? On an 8GB box, 64-bit software stack, what is the optimum memory allocation for a pure-InnoDB (* - MyISAM used only for grant tables) mysql server running as the sole application on the machine? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Monday, October 17, 2005 10:55 AM To: mysql@lists.mysql.com mailto:mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, I am not 100 % sure that the problem we saw was in a 64-bit Linux. It might have been 32-bit. Anyway, since CentOS is a clone of RHEL, this might be the same file cache phenomenon. I do not know if one can force the file cache to stay smaller than 4 GB. You can try running some dummy programs that occupy a few GB of memory. Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; mysql@lists.mysql.com mailto:mysql@lists.mysql.com Lähetetty: Monday, October 17, 2005 8:49 PM Aihe: RE: Non-linear degradation in bulk loads? Sorry to spam the group, but I just noticed that I asserted we were on a 2.4 kernel -- we're on 2.6.9-11 as part of CentOS 4.1... -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com mailto:mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com mailto:mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mailto:[EMAIL PROTECTED];=20 mysql@lists.mysql.com mailto:mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email
RE: Non-linear degradation in bulk loads?
About 6GB... Is there any way to forcibly limit this? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http://lists.mysql.com/[EMAIL PROTECTED] =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Non-linear degradation in bulk loads?
Actually, I believe we're running 32-bit, with bigmem... Does similar behavior occur in such a scenario? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http://lists.mysql.com/[EMAIL PROTECTED] =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http
Re: Non-linear degradation in bulk loads?
Jon, I do not know. Why not install a 64-bit Linux in your computer? Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Monday, October 17, 2005 8:46 PM Aihe: RE: Non-linear degradation in bulk loads? Actually, I believe we're running 32-bit, with bigmem... Does similar behavior occur in such a scenario? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http://lists.mysql.com/[EMAIL PROTECTED] =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http
RE: Non-linear degradation in bulk loads?
Sorry to spam the group, but I just noticed that I asserted we were on a 2.4 kernel -- we're on 2.6.9-11 as part of CentOS 4.1... -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http://lists.mysql.com/[EMAIL PROTECTED] =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe
Re: Non-linear degradation in bulk loads?
Jon, I am not 100 % sure that the problem we saw was in a 64-bit Linux. It might have been 32-bit. Anyway, since CentOS is a clone of RHEL, this might be the same file cache phenomenon. I do not know if one can force the file cache to stay smaller than 4 GB. You can try running some dummy programs that occupy a few GB of memory. Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Monday, October 17, 2005 8:49 PM Aihe: RE: Non-linear degradation in bulk loads? Sorry to spam the group, but I just noticed that I asserted we were on a 2.4 kernel -- we're on 2.6.9-11 as part of CentOS 4.1... -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http
RE: Non-linear degradation in bulk loads?
We only upgraded to CentOS 4.1 due to an emergency data center migration... We weren't prepared to undergo the risk of a 64-bit upgrade at the same time. I believe we're experimenting with 64-bit kernel now as part of our efforts to diagnose and resolve the I/O issue. -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Monday, October 17, 2005 10:52 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, I do not know. Why not install a 64-bit Linux in your computer? Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Monday, October 17, 2005 8:46 PM Aihe: RE: Non-linear degradation in bulk loads? Actually, I believe we're running 32-bit, with bigmem... Does similar behavior occur in such a scenario? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large
RE: Non-linear degradation in bulk loads?
Side question: If I use a 64-bit MySQL build on a 64-bit kernel, is it safe and sane to allocate say, 6GB to the InnoDB buffer pool? On an 8GB box, 64-bit software stack, what is the optimum memory allocation for a pure-InnoDB (* - MyISAM used only for grant tables) mysql server running as the sole application on the machine? -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Monday, October 17, 2005 10:55 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, I am not 100 % sure that the problem we saw was in a 64-bit Linux. It might have been 32-bit. Anyway, since CentOS is a clone of RHEL, this might be the same file cache phenomenon. I do not know if one can force the file cache to stay smaller than 4 GB. You can try running some dummy programs that occupy a few GB of memory. Regards, Heikki Oracle/Innobase - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Monday, October 17, 2005 8:49 PM Aihe: RE: Non-linear degradation in bulk loads? Sorry to spam the group, but I just noticed that I asserted we were on a 2.4 kernel -- we're on 2.6.9-11 as part of CentOS 4.1... -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Sunday, October 16, 2005 2:23 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order
Re: Non-linear degradation in bulk loads?
Jon, your hardware/OS combination is quite new and unusual. It might indeed be an OS problem. We observed from a 64-bit RHEL 4 that when the file cache of the OS grew bigger than 4 GB, then the file I/O performance dropped to one tenth of the normal. You would not expect that kind of behavior from a 64-bit OS. When you see the slowdown, what does 'top' say about the OS file cache size? The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Regards, Heikki - Original Message - From: Jon Frisby [EMAIL PROTECTED] Newsgroups: mailing.database.myodbc Sent: Friday, October 14, 2005 10:39 PM Subject: RE: Non-linear degradation in bulk loads? I've tried tweaking the structure of the schema to have, for example, a = PRIMARY KEY index on email, no other indexes, and then insert in sorted = order -- made no improvement whatsoever. Another clue that leads me to = believe that this may be an OS issue: Starting a large cp on the same = box (from a local filesystem other than the one the InnoDB data pool was = on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from = ~15-20 connections at any given instant to 750 (our max_connections = setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? =20 Jon, =20 hmm... maybe one of the indexes inevitably is in a random order. =20 Please post a typical =20 SHOW INNODB STATUS\G =20 when the inserts happen slowly. =20 What is your my.cnf like? =20 Regards, =20 Heikki Innobase/Oracle =20 - Alkuper=E4inen viesti - L=E4hett=E4j=E4: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED];=20 mysql@lists.mysql.com L=E4hetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? =20 =20 Two solutions: 1) sort the rows to be inserted on the key 'email'=20 before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting=20 from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily=20 turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=3D0; For big tables, this saves a lot of disk I/O because InnoDB can use=20 its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! =20 After sending my mail, I discovered SET UNIQUE_CHECKS=3D0, and=20 subsequent to that it also occurred to me to try putting the=20 data in in sorted order. Unfortunately, doing=20 UNIQUE_CHECKS=3D0 did not work, and even the combination of=20 both did not work. First chunk (3.4m rows) was ~1.5 minutes,=20 second was ~5 minutes... =20 At this point I'm inclined to believe that there is something=20 very wrong with the disk subsystem because of this and other=20 problems (doing a large cp from the datapool filesystem to=20 another filesystem brought the database to a near-halt, among=20 other things). =20 As a stop-gap solution, I created the table with no indexes,=20 and loaded all the data (loaded in linear time), and plan on=20 doing a CREATE UNIQUE INDEX on the table. Will this happen=20 in linear time, or near-linear time? =20 *sigh* =20 -JF=20 =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: =20 http://lists.mysql.com/[EMAIL PROTECTED] =20 =20 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Non-linear degradation in bulk loads?
I've tried tweaking the structure of the schema to have, for example, a PRIMARY KEY index on email, no other indexes, and then insert in sorted order -- made no improvement whatsoever. Another clue that leads me to believe that this may be an OS issue: Starting a large cp on the same box (from a local filesystem other than the one the InnoDB data pool was on, to NFS) caused MySQL to become COMPLETELY backlogged (we went from ~15-20 connections at any given instant to 750 (our max_connections setting)). -JF -Original Message- From: Heikki Tuuri [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 8:15 AM To: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Jon, hmm... maybe one of the indexes inevitably is in a random order. Please post a typical SHOW INNODB STATUS\G when the inserts happen slowly. What is your my.cnf like? Regards, Heikki Innobase/Oracle - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? Two solutions: 1) sort the rows to be inserted on the key 'email' before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=0; For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! After sending my mail, I discovered SET UNIQUE_CHECKS=0, and subsequent to that it also occurred to me to try putting the data in in sorted order. Unfortunately, doing UNIQUE_CHECKS=0 did not work, and even the combination of both did not work. First chunk (3.4m rows) was ~1.5 minutes, second was ~5 minutes... At this point I'm inclined to believe that there is something very wrong with the disk subsystem because of this and other problems (doing a large cp from the datapool filesystem to another filesystem brought the database to a near-halt, among other things). As a stop-gap solution, I created the table with no indexes, and loaded all the data (loaded in linear time), and plan on doing a CREATE UNIQUE INDEX on the table. Will this happen in linear time, or near-linear time? *sigh* -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
Jon, hmm... maybe one of the indexes inevitably is in a random order. Please post a typical SHOW INNODB STATUS\G when the inserts happen slowly. What is your my.cnf like? Regards, Heikki Innobase/Oracle - Alkuperäinen viesti - Lähettäjä: Jon Frisby [EMAIL PROTECTED] Vastaanottaja: Heikki Tuuri [EMAIL PROTECTED]; mysql@lists.mysql.com Lähetetty: Wednesday, October 12, 2005 3:08 AM Aihe: RE: Non-linear degradation in bulk loads? Two solutions: 1) sort the rows to be inserted on the key 'email' before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=0; For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! After sending my mail, I discovered SET UNIQUE_CHECKS=0, and subsequent to that it also occurred to me to try putting the data in in sorted order. Unfortunately, doing UNIQUE_CHECKS=0 did not work, and even the combination of both did not work. First chunk (3.4m rows) was ~1.5 minutes, second was ~5 minutes... At this point I'm inclined to believe that there is something very wrong with the disk subsystem because of this and other problems (doing a large cp from the datapool filesystem to another filesystem brought the database to a near-halt, among other things). As a stop-gap solution, I created the table with no indexes, and loaded all the data (loaded in linear time), and plan on doing a CREATE UNIQUE INDEX on the table. Will this happen in linear time, or near-linear time? *sigh* -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
Jon Frisby wrote: Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF Hi Jon, I experienced this same non-linear degradation during large imports, exactly like you are describing, about 18 months ago. I don't remember if I found a specific cause, but I am fairly certain that it was related to a few issues, and that we did resolve it. I have not seen this happen with MyISAM tables, and we were able import our full data by breaking it into chunks and waiting between each chunk, so I believe it to be related to InnoDB's logs in some way. Since you are already importing your data in chunks, try making each chunk a separate transaction, or waiting until disk activity slows to load the next chunk. Also, there have been major improvements to InnoDB in the later 4.1 releases, so if possible, I would suggest upgrading. Hope that helps! Devananda vdv -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
Jon, my guess is that the inserts to the UNIQUE secondary index cause the workload to be seriously disk-bound. Two solutions: 1) sort the rows to be inserted on the key 'email' before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=0; For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! Note that now you can get support on this MySQL mailing list from a Vice President of Oracle. I hope that the level of support improves. Best regards, Heikki Vice President, server technology Oracle/Innobase Oy ... Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
At 11:41 PM 10/10/2005, you wrote: Hi Jon, Well, may be the next suggestions might help you. Disable Keys does apply to non-unique keys only. So I suggest to focus on your unique Email key. You could do some tests with: a. drop the unique key on Email b. load the various bulks c. after loading, define Email, eg. as (unique) Primary Key. After b. you can do also ALTER TABLE ORDER BY Email, to get Email index in pace with the physical order of the data. Hope it helps ... Best wishes, Cor - Original Message - From: Jon Frisby [EMAIL PROTECTED] To: mysql@lists.mysql.com Sent: Tuesday, October 11, 2005 4:12 AM Subject: Non-linear degradation in bulk loads? Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF JF, It's likely the indexes that are causing the slowdown. If you remove all indexes from the table definition and start with an empty table, you should see a dramatic speed increase. When all the data has been loaded, use one Alter Table command to rebuild all the indexes. You could try to optimize the table after each load to see if that speeds things up (this will rebalance the index distribution). I was able to load 100 million rows relatively fast, but failed miserably at 500 million rows because of insufficient memory. Mike P.S. I suppose you already know when you use Load Data to load data into an empty table, it won't update the indexes until the load has completed. This is why loading data into an empty table is much faster than loading data into a table that has rows in it (even if it is only 1 row). -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Non-linear degradation in bulk loads?
Two solutions: 1) sort the rows to be inserted on the key 'email' before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html If you have UNIQUE constraints on secondary keys, starting from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=0; For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. But make sure you do not have any duplicates in the rows! After sending my mail, I discovered SET UNIQUE_CHECKS=0, and subsequent to that it also occurred to me to try putting the data in in sorted order. Unfortunately, doing UNIQUE_CHECKS=0 did not work, and even the combination of both did not work. First chunk (3.4m rows) was ~1.5 minutes, second was ~5 minutes... At this point I'm inclined to believe that there is something very wrong with the disk subsystem because of this and other problems (doing a large cp from the datapool filesystem to another filesystem brought the database to a near-halt, among other things). As a stop-gap solution, I created the table with no indexes, and loaded all the data (loaded in linear time), and plan on doing a CREATE UNIQUE INDEX on the table. Will this happen in linear time, or near-linear time? *sigh* -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
Not sure but given that you suffer from non-linear degradation in performance;my guess is you might be extending your ibdata file every too frequently during the batch load process. Check the ibdata_data_file_path variable in my.cnf for more details. Cheers Manoj On 10/11/05, Jon Frisby [EMAIL PROTECTED] wrote: Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Non-linear degradation in bulk loads?
Manoj, Thanks for the reply! Unfortunately, I failed to note that we don't have an auto-extending data pool. Our data pool is about 212GB, in 4GB chunks, with about 4.5GB free right now. We extend it explicitly, and monitor to make sure it doesn't fill up. -JF -Original Message- From: Manoj [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 8:51 PM To: Jon Frisby Cc: mysql@lists.mysql.com Subject: Re: Non-linear degradation in bulk loads? Not sure but given that you suffer from non-linear degradation in performance;my guess is you might be extending your ibdata file every too frequently during the batch load process. Check the ibdata_data_file_path variable in my.cnf for more details. Cheers Manoj On 10/11/05, Jon Frisby [EMAIL PROTECTED] wrote: Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Non-linear degradation in bulk loads?
Hi Jon, Well, may be the next suggestions might help you. Disable Keys does apply to non-unique keys only. So I suggest to focus on your unique Email key. You could do some tests with: a. drop the unique key on Email b. load the various bulks c. after loading, define Email, eg. as (unique) Primary Key. After b. you can do also ALTER TABLE ORDER BY Email, to get Email index in pace with the physical order of the data. Hope it helps ... Best wishes, Cor - Original Message - From: Jon Frisby [EMAIL PROTECTED] To: mysql@lists.mysql.com Sent: Tuesday, October 11, 2005 4:12 AM Subject: Non-linear degradation in bulk loads? Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]