Re: 1 day 28 min insert
Mikhail Entaltsev wrote: You _could_ try adding an identical primary key to the stat_in table as you have on the 321st_stat table. However, since we need all of the rows from stat_in in the results, I am not sure that it will help speed up the join (because it's a left join). Even though I think the index would be ignored, it is worth a try to see if it would make a difference with a new EXPLAIN. Tried this, setup a matching index on the temp table, it took 17 hours to load the input file into the temp stat_in table, so it's definitly not going to save me any time I've used enable/disable keys before, but 2 problems, one it was only 5 % faster, and two, I will have primary key violations when I enable the primary key, it wont enable it, at least that's my understanding of it. Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
Mikhail Entaltsev wrote: Hi, insert into 321st_stat select * from stat_in group by primary key fields from 321st_stat table; did you try to use this query? Best regards, Mikhail. Ran it, it took at least 24 hours, it finished but never gave me the total time, when I checked the server mysql dropped me back to the command prompt, with no time or number of records :( Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
Could you execute show create table 321st_stat and show create table stat_in and send results back? Best regards, Mikhail. - Original Message - From: matt ryan [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, August 27, 2004 4:45 PM Subject: Re: 1 day 28 min insert Mikhail Entaltsev wrote: Hi, insert into 321st_stat select * from stat_in group by primary key fields from 321st_stat table; did you try to use this query? Best regards, Mikhail. Ran it, it took at least 24 hours, it finished but never gave me the total time, when I checked the server mysql dropped me back to the command prompt, with no time or number of records :( Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
Mikhail Entaltsev wrote: Could you execute show create table 321st_stat and show create table stat_in and send results back? I have no key's on the temp table, stat_in, do you think adding keys on the whole primary key would be faster? I wasnt sure if you could join mysql keys, the key is called primary key so would it be a.primary key = b.primary key ? mysql explain select a.* from stat_in a left outer join 321st_stat b on a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn =b.dte_txn where isnull(b.don); | id | select_type | table | type | possible_keys| key | key_len | ref| rows | Extra | | 1 | SIMPLE | a | ALL| NULL | NULL| NULL | NULL| 77269086 | | | 1 | SIMPLE | b | eq_ref | PRIMARY,don,niin,dic,dte_txn | PRIMARY |39 | finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a.suf,finlog.a.dte_txn,finlog.a.sta | 1 | Using where; Not exists | 2 rows in set (0.11 sec) ---+ | 321st_stat | CREATE TABLE `321st_stat` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '', PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`), KEY `don` (`don`), KEY `niin` (`niin`), KEY `stor` (`stor`), KEY `dic` (`dic`), KEY `ctasc` (`ctasc`), KEY `dte_txn` (`dte_txn`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 | 1 row in set (0.03 sec) | stat_in | CREATE TABLE `stat_in` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '' ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 | -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
I see two things I would change: First, the column 321st_stat.dic is the first column of your primary key and has a second index on just it. That second index is redundant and could be deleted. Second, in your WHERE clause you say : WHERE isnull(b.don) . That forces the engine to run a function on every row of the b.don column to return a logical value. The simpler and faster thing to have said is : WHERE b.don is null . That is a direct, native comparison and will use an index if one is available. You _could_ try adding an identical primary key to the stat_in table as you have on the 321st_stat table. However, since we need all of the rows from stat_in in the results, I am not sure that it will help speed up the join (because it's a left join). Even though I think the index would be ignored, it is worth a try to see if it would make a difference with a new EXPLAIN.. Have you considered wrapping your insert with : ALTER TABLE `321st_stat` DISABLE KEYS ALTER TABLE `321st_stat` ENABLE KEYS ? matt ryan [EMAIL PROTECTED] wrote on 08/27/2004 03:25:58 PM: Mikhail Entaltsev wrote: Could you execute show create table 321st_stat and show create table stat_in and send results back? I have no key's on the temp table, stat_in, do you think adding keys on the whole primary key would be faster? I wasnt sure if you could join mysql keys, the key is called primary key so would it be a.primary key = b.primary key ? mysql explain select a.* from stat_in a left outer join 321st_stat b on a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn =b.dte_txn where isnull(b.don); | id | select_type | table | type | possible_keys| key | key_len | ref| rows | Extra | | 1 | SIMPLE | a | ALL| NULL | NULL| NULL | NULL| 77269086 | | | 1 | SIMPLE | b | eq_ref | PRIMARY,don,niin,dic,dte_txn | PRIMARY |39 | finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a. suf,finlog.a.dte_txn,finlog.a.sta | 1 | Using where; Not exists | 2 rows in set (0.11 sec) ---+ | 321st_stat | CREATE TABLE `321st_stat` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '', PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`), KEY `don` (`don`), KEY `niin` (`niin`), KEY `stor` (`stor`), KEY `dic` (`dic`), KEY `ctasc` (`ctasc`), KEY `dte_txn` (`dte_txn`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 | 1 row in set (0.03 sec) | stat_in | CREATE TABLE `stat_in` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '' ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 | -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
First of all, IMHO index 321st_stat.dic is useless since you have dic as the first field in 321st_stat.PRIMARY KEY. Second, I would recommend add KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`) on stat_in table and set PACK_KEYS=0 for stat_in table. Then measure execution time of select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta; If it is relatively short then the problem is in number of keys and PACK_KEYS for 321st_stat table. Please, let me know about your results. Best regards, Mikhail. - Original Message - From: matt ryan [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, August 27, 2004 9:25 PM Subject: Re: 1 day 28 min insert Mikhail Entaltsev wrote: Could you execute show create table 321st_stat and show create table stat_in and send results back? I have no key's on the temp table, stat_in, do you think adding keys on the whole primary key would be faster? I wasnt sure if you could join mysql keys, the key is called primary key so would it be a.primary key = b.primary key ? mysql explain select a.* from stat_in a left outer join 321st_stat b on a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn =b.dte_txn where isnull(b.don); | id | select_type | table | type | possible_keys| key | key_len | ref| rows | Extra | | 1 | SIMPLE | a | ALL| NULL | NULL| NULL | NULL| 77269086 | | | 1 | SIMPLE | b | eq_ref | PRIMARY,don,niin,dic,dte_txn | PRIMARY |39 | finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a.suf,finlog. a.dte_txn,finlog.a.sta | 1 | Using where; Not exists | 2 rows in set (0.11 sec) ---+ | 321st_stat | CREATE TABLE `321st_stat` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '', PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`), KEY `don` (`don`), KEY `niin` (`niin`), KEY `stor` (`stor`), KEY `dic` (`dic`), KEY `ctasc` (`ctasc`), KEY `dte_txn` (`dte_txn`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 | 1 row in set (0.03 sec) | stat_in | CREATE TABLE `stat_in` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '' ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 | -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
You _could_ try adding an identical primary key to the stat_in table as you have on the 321st_stat table. However, since we need all of the rows from stat_in in the results, I am not sure that it will help speed up the join (because it's a left join). Even though I think the index would be ignored, it is worth a try to see if it would make a difference with a new EXPLAIN.. But it should speed up grouping by PRIMARY KEY in: insert into 321st_stat select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta; Mikhail. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
If that were a valid SQL statement, I would agree with you. What you wrote **only** functions through a MySQL-specific SQL extension that permits non-aggregated, non-grouped columns to exist in the SELECT clause. I do not understand why you believe that a GROUP BY test will prove any sort of performance gain for his original INSERT problem. It took just over 24 hours to do an INSERT IGNORE to add just about 1000 rows to a 77 million row table. What metric of that performance will your GROUP BY test help to prove? Respecfully puzzled, Shawn Green Database Administrator Unimin Corporation - Spruce Pine Mikhail Entaltsev [EMAIL PROTECTED] wrote on 08/27/2004 04:36:43 PM: You _could_ try adding an identical primary key to the stat_in table as you have on the 321st_stat table. However, since we need all of the rows from stat_in in the results, I am not sure that it will help speed up the join (because it's a left join). Even though I think the index would be ignored, it is worth a try to see if it would make a difference with a new EXPLAIN.. But it should speed up grouping by PRIMARY KEY in: insert into 321st_stat select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta; Mikhail.
Re: 1 day 28 min insert
If that were a valid SQL statement, I would agree with you. What you wrote **only** functions through a MySQL-specific SQL extension that permits non-aggregated, non-grouped columns to exist in the SELECT clause. He is working with MySQL and that query is working in MySQL. So everything should be fine. :) But I agree that it is MySQL-specific query. I do not understand why you believe that a GROUP BY test will prove any sort of performance gain for his original INSERT problem. It took just over 24 hours to do an INSERT IGNORE to add just about 1000 rows to a 77 million row table. What metric of that performance will your GROUP BY test help to prove? Yes, it is unclear from my previous email. Let's look at that query: insert into 321st_stat select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta; Why it could be slow? Either select part is slow, either insert or both. If select part is slow then adding KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`) on stat_in table should help. If select part is fast then IMHO the problem is in number of keys and PACK_KEYS for 321st_stat table. Right? Best regards, Mikhail. - Original Message - From: [EMAIL PROTECTED] To: Mikhail Entaltsev Cc: matt ryan ; [EMAIL PROTECTED] Sent: Friday, August 27, 2004 10:40 PM Subject: Re: 1 day 28 min insert If that were a valid SQL statement, I would agree with you. What you wrote **only** functions through a MySQL-specific SQL extension that permits non-aggregated, non-grouped columns to exist in the SELECT clause. I do not understand why you believe that a GROUP BY test will prove any sort of performance gain for his original INSERT problem. It took just over 24 hours to do an INSERT IGNORE to add just about 1000 rows to a 77 million row table. What metric of that performance will your GROUP BY test help to prove? Respecfully puzzled, Shawn Green Database Administrator Unimin Corporation - Spruce Pine -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
matt ryan wrote: The table is 9 gig, and the index is 8 gig unfortunately the primary unique key is almost every column, if I were to make it one using concat, it would be huge. I tried making those fields a hash, but it did not work, I had duplicate hashes for non duplicate records!! Matt If I well understood, You have in Your index almost all data, You have in Your table? Why not add field for unique key (auto increment if You want less work)? It reduces size of Your primary index and thus speed up working with it! Best regards -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: 1 day 28 min insert
Matt, I know you have gotten a lot of recommendations, I have 3 for you that I don't think anybody has mentioned. 1. Try a merge table. We had 1 table with about 750 million rows in it, and every once in a while we would need to do something crazy to it and it would be locked up for hours. We decided to break it up into 10 different tables, based on the last digit of one of the fields. So whenever we did inserts they went directly into one of the 10 tables, but whenever we did a select if we didn't know which table to search in we used the merge table and it was just fine. This definitely helped us. 2. Get MySQL to come to your office and have them do some consulting. Or have them do it online, personally I recommend onsite consulting. It's not very expensive especially if you can solve your problem. We have MySQL coming out to our office in right over a week from now and the consultant is staying for 3 days. Do I really need the consultant to come in, not really, but if they can tweak, improve, or help us with just a few things, it's worth every penny. For example, two weeks ago, I was about to buy another quad Xeon 3.4ghz machine and before I ordered it I contacted our consultant just to ask their opinion and he told me to go with a 64bit machine. So now I have a quad AMD 64 2.2 with 32gigs of memory coming in next Wednesday for me to play with. You can find the information on the mysql site somewhere. 3. The last option, is a cheaper option than #2, but it can sometimes work just as well. Find the next time mysql will be holding a training class in your area, and go to the class. Then while you are in the class have the instructor look at your tables, queries, etc... and see what they think. Now, I've been to 3 of them now, some of the instructors can baffle you with how much they know. And some of them are more book taught. As one instructor asked me, if you can write a 12 table join without looking at your table structure, why are you here? My response, I have this problem, want to take a look. My problem was resolved by the end of the session. So it's up to you, but personally I recommend #2. You can have them come in and you can have a list of 100 questions, and go right down the list and have them answer everyone of them. I may even have them setup my new little toy when he comes in, you never know. Donny -Original Message- From: Remigiusz Sokoowski [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 1:12 AM To: matt ryan; [EMAIL PROTECTED] Subject: Re: 1 day 28 min insert matt ryan wrote: The table is 9 gig, and the index is 8 gig unfortunately the primary unique key is almost every column, if I were to make it one using concat, it would be huge. I tried making those fields a hash, but it did not work, I had duplicate hashes for non duplicate records!! Matt If I well understood, You have in Your index almost all data, You have in Your table? Why not add field for unique key (auto increment if You want less work)? It reduces size of Your primary index and thus speed up working with it! Best regards -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
Hi, insert into 321st_stat select * from stat_in group by primary key fields from 321st_stat table; did you try to use this query? Best regards, Mikhail. - Original Message - From: matt ryan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 6:06 PM Subject: 1 day 28 min insert I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 I just cant deal with speeds this slow, an insert onto a table with a primary key that tosses out almost all records shouldnt take this long to do -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
1 day 28 min insert
I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 I just cant deal with speeds this slow, an insert onto a table with a primary key that tosses out almost all records shouldnt take this long to do -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
I'm guessing that you have indexes on the 321st_stat table? If this is the case, try dropping them before you do the insert, then rebuilding them. MySQL is known to be slow at doing bulk inserts on indexed tables. Also, updates are much faster than inserts since with inserts there are much more disk IOs required. Cheers Andrew. - Original Message - From: matt ryan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 5:06 PM Subject: 1 day 28 min insert I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 I just cant deal with speeds this slow, an insert onto a table with a primary key that tosses out almost all records shouldnt take this long to do -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
In the last episode (Aug 19), matt ryan said: I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 This is very fast (53000 updates per second). If you are truncating this table after the following insert, you can skip this step completely by selecting field1,field2,'321ST',field4 (for example) in your INSERT statement instead of selecting *. mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 And this is definitely too slow :) You'll probably have to look at the mysql stats while this query is running to determine exactly what it's doing, since mysql can't run EXPLAIN on INSERT commands. One alternative, since you know you don't have many records to insert, is to pull the IDs of the missing records and insert just those. Do an outer join on the two tables (joining on the primary key), get a list of the IDs of records in stat_in but not in 321st_stat, and add a WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT statement. If you're running 4.1, you can use a subquery and embed the first query directly in the INSERT. -- Dan Nelson [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
One alternative, since you know you don't have many records to insert, is to pull the IDs of the missing records and insert just those. Do an outer join on the two tables (joining on the primary key), get a list of the IDs of records in stat_in but not in 321st_stat, and add a WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT statement. If you're running 4.1, you can use a subquery and embed the first query directly in the INSERT. Running 4.0.x something like, select a.*, b.* from a left outer join b on a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into temptable then insert from temptable into table a I think I tried this once, but it ran all day Is there a way to join on an index, instead of on each column? The primary key is 6-8 columns I forget -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
- Original Message - From: matt ryan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 12:06 PM Subject: 1 day 28 min insert I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 I just cant deal with speeds this slow, an insert onto a table with a primary key that tosses out almost all records shouldnt take this long to do Your post is very cryptic and I really don't understand just what you want. Do you just want people to agree that 24+ hours is too long for an insert to take place? If so, I would agree that 24 hours is way too long to insert a *single* record. However, I've never tried inserting 77 million records so I'm not sure if 24 hours is unreasonable for that. Maybe others who have more experience with large databases can comment on that. Or are you asking a question? If so, what exactly *is* your question? Are you worried about why some records were missed or why the insert took so long or both? If you're trying to get some help with a problem, you'll have to give us more information. At the very least, we need to know something about your environment, like which version of MySQL you are using. We're likely to need to know more as we start to understand the question, such as your table definition(s), and we may need to see the code that is doing the inserts to see if we can help you optimize it. We may even need information about your hardware if this is a performance question to see whether the times you are getting are reasonable for your hardware. Rhino -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
Andrew Pattison wrote: I'm guessing that you have indexes on the 321st_stat table? If this is the case, try dropping them before you do the insert, then rebuilding them. MySQL is known to be slow at doing bulk inserts on indexed tables. Also, updates are much faster than inserts since with inserts there are much more disk IOs required. Cheers Table has a large primary key, to keep duplicates out, so I cant drop the index and remove. Also, reindexing the table takes all day, dropping is not an option Would I be better off doing an insert replace on all 17m new records, or only inserting the 500 new records using insert ignore ? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
In the last episode (Aug 19), matt ryan said: One alternative, since you know you don't have many records to insert, is to pull the IDs of the missing records and insert just those. Do an outer join on the two tables (joining on the primary key), get a list of the IDs of records in stat_in but not in 321st_stat, and add a WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT statement. If you're running 4.1, you can use a subquery and embed the first query directly in the INSERT. Running 4.0.x something like, select a.*, b.* from a left outer join b on a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into temptable then insert from temptable into table a I think I tried this once, but it ran all day The select, or the insert? If the select, run an EXPLAIN and see if it's using the wrong index or no index. Is there a way to join on an index, instead of on each column? The primary key is 6-8 columns I forget You have to list each column. SHOW KEYS FROM table or SHOW CREATE TABLE table will list all the keys and which fields are used in each key. -- Dan Nelson [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
It could save you lots of processing time to pre-collect a list of the records you are actually going to add to the table. I know it sounds counter-intuitive but this has saved me OODLES of time on several occasions (of course it can't work for all situations but in your case where you are NOT inserting so many records I think it will help considerably). CREATE TABLE tmpInsertMe (keyfield int); SELECT s.keyfield FROM stat_in s LEFT JOIN 321st_stat st on st.keyfield = s.keyfield WHERE st.keyfield is null This may take a while to process (depends on if stat_in is also indexed on its keyfield field) but I should take much less time than your 24 minute insert. Now add a key to tmpInsertMe to speed up the JOIN during the insert. On 1000+ records (based on your example statistics) this should take way under a second. ALTER TABLE tmpInsertMe add Key(keyfield); Then you insert to 321st_stat only those records that are new (1000+) so you end up with 1/77000th of the processing required to perform the actual insert as it doesn't need to check for the IGNORE case for all of the other records. INSERT 321st_stat SELECT s.* FROM stat_IN s INNER JOIN tmpInsertMe t ON t.keyfield = s.keyfield; And you are through with tmpInsertMe, so drop it: DROP TABLE tmpInsertMe; Of course, you will need to modify the field name and type of keyfield to match your situation. As I said, it is not exactly an intuitive technique but I tried it once out of desperation and I was HUGELY, INCREDIBLY IMPRESSED with the performance improvement it created. At the time I was merging about 100,000 records into a 3,000,000 record table. Only about 1000 were new, and I had about 20,000 updates to perform, the rest were just old duplicates. Doing direct joins from source table to the destination table was killing me so I tried what I described and it worked very well FWIW, Shawn Green Database Administrator Unimin Corporation - Spruce Pine Andrew Pattison [EMAIL PROTECTED] wrote on 08/19/2004 12:24:18 PM: I'm guessing that you have indexes on the 321st_stat table? If this is the case, try dropping them before you do the insert, then rebuilding them. MySQL is known to be slow at doing bulk inserts on indexed tables. Also, updates are much faster than inserts since with inserts there are much more disk IOs required. Cheers Andrew. - Original Message - From: matt ryan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 5:06 PM Subject: 1 day 28 min insert I think oracle parallel query is calling me 110,832,565 stat records 77,269,086 on weekly update, I get small daily files, but daily sql's dont work very well, and miss records, in this case it missed 563 records. mysql update stat_in set ctasc='321ST'; Query OK, 77269086 rows affected (24 min 17.60 sec) Rows matched: 77269086 Changed: 77269086 Warnings: 0 mysql insert ignore into 321st_stat select * from stat_in; Query OK, 563 rows affected (1 day 28 min 35.95 sec) Records: 77269086 Duplicates: 77268523 Warnings: 0 I just cant deal with speeds this slow, an insert onto a table with a primary key that tosses out almost all records shouldnt take this long to do -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
You are DEFINITELY not going to win any races with INSERT REPLACE. Minimizing the number of records you need to process will improve your times significantly, especially since you need less than .002% of your source data added to your destination table. Shawn Green Database Administrator Unimin Corporation - Spruce Pine matt ryan [EMAIL PROTECTED] wrote on 08/19/2004 12:42:10 PM: Andrew Pattison wrote: I'm guessing that you have indexes on the 321st_stat table? If this is the case, try dropping them before you do the insert, then rebuilding them. MySQL is known to be slow at doing bulk inserts on indexed tables. Also, updates are much faster than inserts since with inserts there are much more disk IOs required. Cheers Table has a large primary key, to keep duplicates out, so I cant drop the index and remove. Also, reindexing the table takes all day, dropping is not an option Would I be better off doing an insert replace on all 17m new records, or only inserting the 500 new records using insert ignore ? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: 1 day 28 min insert
Matt: You would be very well served to give the table a unique physical key. Alternate or primary, you could then join to this single column. I am not sure how big the table to get, however, so you will definitely want to use a bigint unsigned if you plan to test this theory. Benefits: - You can remove the 6 to 8 columns of the PK in the table you are writing to and replace it with a small integer column. Over 17m inserts and comparisons you can imagine the savings in disk IO. - Easier for the optimizer to choose the correct index. - More likely to get reads straight from the index. - Easier code to write/read/maintain Also, piling the records to be inserted into a temporary table can be much quicker. This is my experience with tables of 2 to 3 million rows. B/c this operation may still be long I do not recommend the use of a temporary table, but rather a regular table that you create and destroy as part of the process. Are you using MyIsam or InnoDB? If MyIsam, then if you are scheduling the insert as delayed and the table is still being read from, you may be experiencing an issue where there are enough reads to keep the insert from getting started. Also if MyIsam, is your row pointer large enough? If not this will slow you down too. Have the tables been analyzed and checked lately? Really, there are a number of server parameters to check. Setting this will be very unique to your situation. Oracle is the same way, esp when clustering. Seems to me that spending $1500 or so on some MySQL consulting would be much less expensive than an Oracle license, plus the cost of porting, plus the consulting it would take to get your Oracle server tuned for your app. Best Regards, Boyd E. Hemphill MySQL Certified Professional [EMAIL PROTECTED] Triand, Inc. www.triand.com O: (512) 248-2287 M: (713) 252-4688 -Original Message- From: Dan Nelson [mailto:[EMAIL PROTECTED] Sent: Thursday, August 19, 2004 10:48 AM To: matt ryan Cc: [EMAIL PROTECTED] Subject: Re: 1 day 28 min insert In the last episode (Aug 19), matt ryan said: One alternative, since you know you don't have many records to insert, is to pull the IDs of the missing records and insert just those. Do an outer join on the two tables (joining on the primary key), get a list of the IDs of records in stat_in but not in 321st_stat, and add a WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT statement. If you're running 4.1, you can use a subquery and embed the first query directly in the INSERT. Running 4.0.x something like, select a.*, b.* from a left outer join b on a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into temptable then insert from temptable into table a I think I tried this once, but it ran all day The select, or the insert? If the select, run an EXPLAIN and see if it's using the wrong index or no index. Is there a way to join on an index, instead of on each column? The primary key is 6-8 columns I forget You have to list each column. SHOW KEYS FROM table or SHOW CREATE TABLE table will list all the keys and which fields are used in each key. -- Dan Nelson [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: 1 day 28 min insert
The table is 9 gig, and the index is 8 gig unfortunately the primary unique key is almost every column, if I were to make it one using concat, it would be huge. I tried making those fields a hash, but it did not work, I had duplicate hashes for non duplicate records!! Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]