Re: 1 day 28 min insert

2004-09-01 Thread matt ryan
Mikhail Entaltsev wrote:
You _could_ try adding an identical primary key to the stat_in table as
you have on the 321st_stat table. However, since we need all of the rows
from stat_in in the results, I am not sure that it will help speed up the
join (because it's a left join). Even though I think the index would be
ignored, it is worth a try to see if it would make a difference with a new
EXPLAIN.
   

Tried this, setup a matching index on the temp table, it took 17 hours 
to load the input file into the temp stat_in table, so it's definitly 
not going to save me any time

I've used enable/disable keys before, but 2 problems, one it was only 5 
% faster, and two, I will have primary key violations when I enable the 
primary key, it wont enable it, at least that's my understanding of it.

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-27 Thread matt ryan
Mikhail Entaltsev wrote:
Hi,
insert into 321st_stat select * from stat_in group by primary key fields
from 321st_stat table;
did you try to use this query?
Best regards,
Mikhail.
 

Ran it, it took at least 24 hours, it finished but never gave me the 
total time, when I checked the server mysql dropped me back to the 
command prompt, with no time or number of records :(

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-27 Thread Mikhail Entaltsev
Could you execute show create table 321st_stat and show create table
stat_in
and send results back?

Best regards, Mikhail.


- Original Message - 
From: matt ryan [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, August 27, 2004 4:45 PM
Subject: Re: 1 day 28 min insert


 Mikhail Entaltsev wrote:

 Hi,
 
 insert into 321st_stat select * from stat_in group by primary key fields
 from 321st_stat table;
 
 did you try to use this query?
 
 Best regards,
 Mikhail.
 
 

 Ran it, it took at least 24 hours, it finished but never gave me the
 total time, when I checked the server mysql dropped me back to the
 command prompt, with no time or number of records :(

 Matt

 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-27 Thread matt ryan
Mikhail Entaltsev wrote:
Could you execute show create table 321st_stat and show create table
stat_in
and send results back?
 

I have no key's on the temp table, stat_in, do you think adding keys on 
the whole primary key would be faster?

I wasnt sure if you could join mysql keys, the key is called primary 
key so would it be a.primary key = b.primary key ?


mysql explain select a.* from stat_in a left outer join 321st_stat b on 
a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and 
a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn 
=b.dte_txn where isnull(b.don);

| id | select_type | table | type   | possible_keys| 
key | key_len | ref| rows | 
Extra   |
|  1 | SIMPLE  | a | ALL| NULL | 
NULL|  NULL | NULL| 77269086 
| |
|  1 | SIMPLE  | b | eq_ref | PRIMARY,don,niin,dic,dte_txn | 
PRIMARY |39 | 
finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a.suf,finlog.a.dte_txn,finlog.a.sta
| 1 | Using where; Not exists |
2 rows in set (0.11 sec)

---+
| 321st_stat | CREATE TABLE `321st_stat` (
 `dic` char(3) NOT NULL default '',
 `fr_ric` char(3) NOT NULL default '',
 `niin` char(11) NOT NULL default '',
 `ui` char(2) NOT NULL default '',
 `qty` char(5) NOT NULL default '',
 `don` char(14) NOT NULL default '',
 `suf` char(1) NOT NULL default '',
 `dte_txn` char(5) NOT NULL default '',
 `ship_to` char(3) NOT NULL default '',
 `sta` char(2) NOT NULL default '',
 `lst_sos` char(3) NOT NULL default '',
 `esd` char(4) NOT NULL default '',
 `stor` char(3) NOT NULL default '',
 `d_t` char(4) NOT NULL default '',
 `ctasc` char(10) NOT NULL default '',
 PRIMARY KEY  (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`),
 KEY `don` (`don`),
 KEY `niin` (`niin`),
 KEY `stor` (`stor`),
 KEY `dic` (`dic`),
 KEY `ctasc` (`ctasc`),
 KEY `dte_txn` (`dte_txn`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 |
1 row in set (0.03 sec)
| stat_in | CREATE TABLE `stat_in` (
 `dic` char(3) NOT NULL default '',
 `fr_ric` char(3) NOT NULL default '',
 `niin` char(11) NOT NULL default '',
 `ui` char(2) NOT NULL default '',
 `qty` char(5) NOT NULL default '',
 `don` char(14) NOT NULL default '',
 `suf` char(1) NOT NULL default '',
 `dte_txn` char(5) NOT NULL default '',
 `ship_to` char(3) NOT NULL default '',
 `sta` char(2) NOT NULL default '',
 `lst_sos` char(3) NOT NULL default '',
 `esd` char(4) NOT NULL default '',
 `stor` char(3) NOT NULL default '',
 `d_t` char(4) NOT NULL default '',
 `ctasc` char(10) NOT NULL default ''
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 |
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-27 Thread SGreen
I see two things I would change:

First, the column 321st_stat.dic is the first column of your primary key 
and has a second index on just it. That second index is redundant and 
could be deleted.

Second, in your WHERE clause you say : WHERE isnull(b.don) .  That forces 
the engine to run a function on every row of the b.don column to return a 
logical value. The simpler and faster thing to have said is :  WHERE b.don 
is null . That is a direct, native comparison and will use an index if one 
is available.

You _could_ try adding an identical primary key to the stat_in table as 
you have on the 321st_stat table. However, since we need all of the rows 
from stat_in in the results, I am not sure that it will help speed up the 
join (because it's a left join). Even though I think the index would be 
ignored, it is worth a try to see if it would make a difference with a new 
EXPLAIN..

Have you considered wrapping your insert with :
ALTER TABLE `321st_stat` DISABLE KEYS
ALTER TABLE `321st_stat` ENABLE KEYS
?


matt ryan [EMAIL PROTECTED] wrote on 08/27/2004 03:25:58 PM:

 Mikhail Entaltsev wrote:
 
 Could you execute show create table 321st_stat and show create table
 stat_in
 and send results back?
  
 
 
 I have no key's on the temp table, stat_in, do you think adding keys on 
 the whole primary key would be faster?
 
 I wasnt sure if you could join mysql keys, the key is called primary 
 key so would it be a.primary key = b.primary key ?
 
 
 
 
 mysql explain select a.* from stat_in a left outer join 321st_stat b on 

 a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and 
 a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn 
 =b.dte_txn where isnull(b.don);
 
 | id | select_type | table | type   | possible_keys| 
 key | key_len | ref| rows | 
 Extra   |
 |  1 | SIMPLE  | a | ALL| NULL | 
 NULL|  NULL | NULL| 77269086 
 | |
 |  1 | SIMPLE  | b | eq_ref | PRIMARY,don,niin,dic,dte_txn | 
 PRIMARY |39 | 
 finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a.
 suf,finlog.a.dte_txn,finlog.a.sta
 | 1 | Using where; Not exists |
 2 rows in set (0.11 sec)
 
 ---+
 | 321st_stat | CREATE TABLE `321st_stat` (
   `dic` char(3) NOT NULL default '',
   `fr_ric` char(3) NOT NULL default '',
   `niin` char(11) NOT NULL default '',
   `ui` char(2) NOT NULL default '',
   `qty` char(5) NOT NULL default '',
   `don` char(14) NOT NULL default '',
   `suf` char(1) NOT NULL default '',
   `dte_txn` char(5) NOT NULL default '',
   `ship_to` char(3) NOT NULL default '',
   `sta` char(2) NOT NULL default '',
   `lst_sos` char(3) NOT NULL default '',
   `esd` char(4) NOT NULL default '',
   `stor` char(3) NOT NULL default '',
   `d_t` char(4) NOT NULL default '',
   `ctasc` char(10) NOT NULL default '',
   PRIMARY KEY  (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`),
   KEY `don` (`don`),
   KEY `niin` (`niin`),
   KEY `stor` (`stor`),
   KEY `dic` (`dic`),
   KEY `ctasc` (`ctasc`),
   KEY `dte_txn` (`dte_txn`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 
|
 1 row in set (0.03 sec)
 
 | stat_in | CREATE TABLE `stat_in` (
   `dic` char(3) NOT NULL default '',
   `fr_ric` char(3) NOT NULL default '',
   `niin` char(11) NOT NULL default '',
   `ui` char(2) NOT NULL default '',
   `qty` char(5) NOT NULL default '',
   `don` char(14) NOT NULL default '',
   `suf` char(1) NOT NULL default '',
   `dte_txn` char(5) NOT NULL default '',
   `ship_to` char(3) NOT NULL default '',
   `sta` char(2) NOT NULL default '',
   `lst_sos` char(3) NOT NULL default '',
   `esd` char(4) NOT NULL default '',
   `stor` char(3) NOT NULL default '',
   `d_t` char(4) NOT NULL default '',
   `ctasc` char(10) NOT NULL default ''
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 |
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


Re: 1 day 28 min insert

2004-08-27 Thread Mikhail Entaltsev
First of all, IMHO  index 321st_stat.dic is  useless since you have dic as
the first field in 321st_stat.PRIMARY KEY.
Second, I would recommend add KEY
(`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`) on stat_in table
and set PACK_KEYS=0 for stat_in table.

Then measure execution time of

select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta;

If it is relatively short then the problem is in number of keys and
PACK_KEYS for 321st_stat table.
Please, let me know about your results.

Best regards,
Mikhail.

- Original Message - 
From: matt ryan [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, August 27, 2004 9:25 PM
Subject: Re: 1 day 28 min insert


 Mikhail Entaltsev wrote:

 Could you execute show create table 321st_stat and show create table
 stat_in
 and send results back?
 
 

 I have no key's on the temp table, stat_in, do you think adding keys on
 the whole primary key would be faster?

 I wasnt sure if you could join mysql keys, the key is called primary
 key so would it be a.primary key = b.primary key ?




 mysql explain select a.* from stat_in a left outer join 321st_stat b on
 a.don=b.don and a.dic=b.dic and a.niin=b.niin and a.sta=b.sta and
 a.qty=b.qty and a.fr_ric=b.fr_ric and a.suf=b.suf and a.dte_txn
 =b.dte_txn where isnull(b.don);

 | id | select_type | table | type   | possible_keys|
 key | key_len | ref| rows |
 Extra   |
 |  1 | SIMPLE  | a | ALL| NULL |
 NULL|  NULL | NULL| 77269086
 | |
 |  1 | SIMPLE  | b | eq_ref | PRIMARY,don,niin,dic,dte_txn |
 PRIMARY |39 |

finlog.a.dic,finlog.a.niin,finlog.a.fr_ric,finlog.a.don,finlog.a.suf,finlog.
a.dte_txn,finlog.a.sta
 | 1 | Using where; Not exists |
 2 rows in set (0.11 sec)

 ---+
 | 321st_stat | CREATE TABLE `321st_stat` (
   `dic` char(3) NOT NULL default '',
   `fr_ric` char(3) NOT NULL default '',
   `niin` char(11) NOT NULL default '',
   `ui` char(2) NOT NULL default '',
   `qty` char(5) NOT NULL default '',
   `don` char(14) NOT NULL default '',
   `suf` char(1) NOT NULL default '',
   `dte_txn` char(5) NOT NULL default '',
   `ship_to` char(3) NOT NULL default '',
   `sta` char(2) NOT NULL default '',
   `lst_sos` char(3) NOT NULL default '',
   `esd` char(4) NOT NULL default '',
   `stor` char(3) NOT NULL default '',
   `d_t` char(4) NOT NULL default '',
   `ctasc` char(10) NOT NULL default '',
   PRIMARY KEY  (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`),
   KEY `don` (`don`),
   KEY `niin` (`niin`),
   KEY `stor` (`stor`),
   KEY `dic` (`dic`),
   KEY `ctasc` (`ctasc`),
   KEY `dte_txn` (`dte_txn`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=900,000,000 PACK_KEYS=1 |
 1 row in set (0.03 sec)

 | stat_in | CREATE TABLE `stat_in` (
   `dic` char(3) NOT NULL default '',
   `fr_ric` char(3) NOT NULL default '',
   `niin` char(11) NOT NULL default '',
   `ui` char(2) NOT NULL default '',
   `qty` char(5) NOT NULL default '',
   `don` char(14) NOT NULL default '',
   `suf` char(1) NOT NULL default '',
   `dte_txn` char(5) NOT NULL default '',
   `ship_to` char(3) NOT NULL default '',
   `sta` char(2) NOT NULL default '',
   `lst_sos` char(3) NOT NULL default '',
   `esd` char(4) NOT NULL default '',
   `stor` char(3) NOT NULL default '',
   `d_t` char(4) NOT NULL default '',
   `ctasc` char(10) NOT NULL default ''
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=99,999,999 PACK_KEYS=1 |


 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-27 Thread Mikhail Entaltsev
 You _could_ try adding an identical primary key to the stat_in table as
 you have on the 321st_stat table. However, since we need all of the rows
 from stat_in in the results, I am not sure that it will help speed up the
 join (because it's a left join). Even though I think the index would be
 ignored, it is worth a try to see if it would make a difference with a new
 EXPLAIN..

But it should speed up grouping by PRIMARY KEY in:

insert into 321st_stat select * from stat_in group by
dic,niin,fr_ric,don,suf,dte_txn,sta;

Mikhail.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-27 Thread SGreen
If that were a valid SQL statement, I would agree with you. What you wrote 
**only** functions through a MySQL-specific SQL extension that permits 
non-aggregated, non-grouped columns to exist in the SELECT clause. 

I do not understand why you believe that a GROUP BY test will prove any 
sort of performance gain for his original INSERT problem.  It took just 
over 24 hours to do an INSERT IGNORE to add just about 1000 rows to a 77 
million row table. What metric of that performance will your GROUP BY test 
help to prove?

Respecfully puzzled,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


Mikhail Entaltsev [EMAIL PROTECTED] wrote on 08/27/2004 
04:36:43 PM:

  You _could_ try adding an identical primary key to the stat_in table 
as
  you have on the 321st_stat table. However, since we need all of the 
rows
  from stat_in in the results, I am not sure that it will help speed up 
the
  join (because it's a left join). Even though I think the index would 
be
  ignored, it is worth a try to see if it would make a difference with a 
new
  EXPLAIN..
 
 But it should speed up grouping by PRIMARY KEY in:
 
 insert into 321st_stat select * from stat_in group by
 dic,niin,fr_ric,don,suf,dte_txn,sta;
 
 Mikhail.
 


Re: 1 day 28 min insert

2004-08-27 Thread Mikhail Entaltsev
 If that were a valid SQL statement, I would agree with you. What you wrote
**only** functions through
 a MySQL-specific SQL extension that permits non-aggregated, non-grouped
columns to exist in the SELECT clause.

He is working with MySQL and that query is working in MySQL. So everything
should be fine. :)
But I agree that it is MySQL-specific query.

 I do not understand why you believe that a GROUP BY test will prove any
sort of performance gain
 for his original INSERT problem.  It took just over 24 hours to do an
INSERT IGNORE
 to add just about 1000 rows to a 77 million row table.
 What metric of that performance will your GROUP BY test help to prove?

Yes, it is unclear from my previous email. Let's look at that query:

insert into 321st_stat
select * from stat_in group by dic,niin,fr_ric,don,suf,dte_txn,sta;

Why it could be slow? Either select part is slow, either insert or both.
If select part is slow then adding KEY
(`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta`) on stat_in table
should help.
If select part is fast then IMHO the problem is in number of keys and
PACK_KEYS for 321st_stat table.
Right?

Best regards,
Mikhail.


- Original Message - 
From: [EMAIL PROTECTED]
To: Mikhail Entaltsev
Cc: matt ryan ; [EMAIL PROTECTED]
Sent: Friday, August 27, 2004 10:40 PM
Subject: Re: 1 day 28 min insert

If that were a valid SQL statement, I would agree with you. What you wrote
**only** functions through a MySQL-specific SQL extension that permits
non-aggregated, non-grouped columns to exist in the SELECT clause.

I do not understand why you believe that a GROUP BY test will prove any sort
of performance gain for his original INSERT problem.  It took just over 24
hours to do an INSERT IGNORE to add just about 1000 rows to a 77 million row
table. What metric of that performance will your GROUP BY test help to
prove?

Respecfully puzzled,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-20 Thread Remigiusz Sokoowski
matt ryan wrote:
The table is 9 gig, and the index is 8 gig
unfortunately the primary unique key is almost every column, if I were 
to make it one using concat, it would be huge.

I tried making those fields a hash, but it did not work, I had 
duplicate hashes for non duplicate records!!

Matt
If I well understood, You have in Your index almost all data, You have 
in Your table?
Why not add field for unique key (auto increment if You want less work)?
It reduces size of Your primary index and thus speed up working with it!

Best regards
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: 1 day 28 min insert

2004-08-20 Thread Donny Simonton
Matt,
I know you have gotten a lot of recommendations, I have 3 for you that I
don't think anybody has mentioned.

1.  Try a merge table.  We had 1 table with about 750 million rows in it,
and every once in a while we would need to do something crazy to it and it
would be locked up for hours.  We decided to break it up into 10 different
tables, based on the last digit of one of the fields.  So whenever we did
inserts they went directly into one of the 10 tables, but whenever we did a
select if we didn't know which table to search in we used the merge table
and it was just fine.  This definitely helped us.
2.  Get MySQL to come to your office and have them do some consulting.  Or
have them do it online, personally I recommend onsite consulting.  It's not
very expensive especially if you can solve your problem.  We have MySQL
coming out to our office in right over a week from now and the consultant is
staying for 3 days.  Do I really need the consultant to come in, not really,
but if they can tweak, improve, or help us with just a few things, it's
worth every penny.  For example, two weeks ago, I was about to buy another
quad Xeon 3.4ghz machine and before I ordered it I contacted our consultant
just to ask their opinion and he told me to go with a 64bit machine.  So now
I have a quad AMD 64 2.2 with 32gigs of memory coming in next Wednesday for
me to play with.  You can find the information on the mysql site somewhere.
3.  The last option, is a cheaper option than #2, but it can sometimes work
just as well.  Find the next time mysql will be holding a training class in
your area, and go to the class.  Then while you are in the class have the
instructor look at your tables, queries, etc... and see what they think.
Now, I've been to 3 of them now, some of the instructors can baffle you with
how much they know.  And some of them are more book taught.  As one
instructor asked me, if you can write a 12 table join without looking at
your table structure, why are you here?  My response, I have this problem,
want to take a look.  My problem was resolved by the end of the session.

So it's up to you, but personally I recommend #2.  You can have them come in
and you can have a list of 100 questions, and go right down the list and
have them answer everyone of them.  I may even have them setup my new little
toy when he comes in, you never know.

Donny

 -Original Message-
 From: Remigiusz Sokoowski [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 20, 2004 1:12 AM
 To: matt ryan; [EMAIL PROTECTED]
 Subject: Re: 1 day 28 min insert
 
 matt ryan wrote:
 
  The table is 9 gig, and the index is 8 gig
 
  unfortunately the primary unique key is almost every column, if I were
  to make it one using concat, it would be huge.
 
  I tried making those fields a hash, but it did not work, I had
  duplicate hashes for non duplicate records!!
 
  Matt
 
 If I well understood, You have in Your index almost all data, You have
 in Your table?
 Why not add field for unique key (auto increment if You want less work)?
 It reduces size of Your primary index and thus speed up working with it!
 
 Best regards
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]
 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-20 Thread Mikhail Entaltsev
Hi,

insert into 321st_stat select * from stat_in group by primary key fields
from 321st_stat table;

did you try to use this query?

Best regards,
Mikhail.



- Original Message - 
From: matt ryan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 6:06 PM
Subject: 1 day 28 min insert


 I think oracle parallel query is calling me

 110,832,565 stat records

 77,269,086 on weekly update, I get small daily files, but daily sql's
 dont work very well, and miss records, in this case it missed 563 records.

 mysql update stat_in set ctasc='321ST';
 Query OK, 77269086 rows affected (24 min 17.60 sec)
 Rows matched: 77269086  Changed: 77269086  Warnings: 0

 mysql insert ignore into 321st_stat select * from stat_in;
 Query OK, 563 rows affected (1 day 28 min 35.95 sec)
 Records: 77269086  Duplicates: 77268523  Warnings: 0

 I just cant deal with speeds this slow, an insert onto a table with a
 primary key that tosses out almost all records shouldnt take this long to
do

 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



1 day 28 min insert

2004-08-19 Thread matt ryan
I think oracle parallel query is calling me
110,832,565 stat records
77,269,086 on weekly update, I get small daily files, but daily sql's 
dont work very well, and miss records, in this case it missed 563 records.

mysql update stat_in set ctasc='321ST';
Query OK, 77269086 rows affected (24 min 17.60 sec)
Rows matched: 77269086  Changed: 77269086  Warnings: 0
mysql insert ignore into 321st_stat select * from stat_in;
Query OK, 563 rows affected (1 day 28 min 35.95 sec)
Records: 77269086  Duplicates: 77268523  Warnings: 0
I just cant deal with speeds this slow, an insert onto a table with a 
primary key that tosses out almost all records shouldnt take this long to do

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-19 Thread Andrew Pattison
I'm guessing that you have indexes on the 321st_stat table? If this is the 
case, try dropping them before you do the insert, then rebuilding them. 
MySQL is known to be slow at doing bulk inserts on indexed tables. Also, 
updates are much faster than inserts since with inserts there are much more 
disk IOs required.

Cheers
Andrew.
- Original Message - 
From: matt ryan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 5:06 PM
Subject: 1 day 28 min insert


I think oracle parallel query is calling me
110,832,565 stat records
77,269,086 on weekly update, I get small daily files, but daily sql's dont 
work very well, and miss records, in this case it missed 563 records.

mysql update stat_in set ctasc='321ST';
Query OK, 77269086 rows affected (24 min 17.60 sec)
Rows matched: 77269086  Changed: 77269086  Warnings: 0
mysql insert ignore into 321st_stat select * from stat_in;
Query OK, 563 rows affected (1 day 28 min 35.95 sec)
Records: 77269086  Duplicates: 77268523  Warnings: 0
I just cant deal with speeds this slow, an insert onto a table with a 
primary key that tosses out almost all records shouldnt take this long to 
do

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: 
http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-19 Thread Dan Nelson
In the last episode (Aug 19), matt ryan said:
 I think oracle parallel query is calling me
 
 110,832,565 stat records
 
 77,269,086 on weekly update, I get small daily files, but daily sql's 
 dont work very well, and miss records, in this case it missed 563 records.
 
 mysql update stat_in set ctasc='321ST';
 Query OK, 77269086 rows affected (24 min 17.60 sec)
 Rows matched: 77269086  Changed: 77269086  Warnings: 0

This is very fast (53000 updates per second).  If you are truncating
this table after the following insert, you can skip this step
completely by selecting field1,field2,'321ST',field4 (for example) in
your INSERT statement instead of selecting *.
 
 mysql insert ignore into 321st_stat select * from stat_in;
 Query OK, 563 rows affected (1 day 28 min 35.95 sec)
 Records: 77269086  Duplicates: 77268523  Warnings: 0

And this is definitely too slow :)  You'll probably have to look at the
mysql stats while this query is running to determine exactly what it's
doing, since mysql can't run EXPLAIN on INSERT commands.  

One alternative, since you know you don't have many records to insert,
is to pull the IDs of the missing records and insert just those.  Do an
outer join on the two tables (joining on the primary key), get a list
of the IDs of records in stat_in but not in 321st_stat, and add a
WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT
statement.  If you're running 4.1, you can use a subquery and embed the
first query directly in the INSERT.

-- 
Dan Nelson
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-19 Thread matt ryan

One alternative, since you know you don't have many records to insert,
is to pull the IDs of the missing records and insert just those.  Do an
outer join on the two tables (joining on the primary key), get a list
of the IDs of records in stat_in but not in 321st_stat, and add a
WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT
statement.  If you're running 4.1, you can use a subquery and embed the
first query directly in the INSERT.
 

Running 4.0.x
something like, select a.*, b.* from a left outer join b on 
a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into 
temptable

then insert from temptable into table a
I think I tried this once, but it ran all day 

Is there a way to join on an index, instead of on each column?   The 
primary key is 6-8 columns I forget

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-19 Thread Rhino

- Original Message - 
From: matt ryan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 12:06 PM
Subject: 1 day 28 min insert


 I think oracle parallel query is calling me

 110,832,565 stat records

 77,269,086 on weekly update, I get small daily files, but daily sql's
 dont work very well, and miss records, in this case it missed 563 records.

 mysql update stat_in set ctasc='321ST';
 Query OK, 77269086 rows affected (24 min 17.60 sec)
 Rows matched: 77269086  Changed: 77269086  Warnings: 0

 mysql insert ignore into 321st_stat select * from stat_in;
 Query OK, 563 rows affected (1 day 28 min 35.95 sec)
 Records: 77269086  Duplicates: 77268523  Warnings: 0

 I just cant deal with speeds this slow, an insert onto a table with a
 primary key that tosses out almost all records shouldnt take this long to
do

Your post is very cryptic and I really don't understand just what you want.

Do you just want people to agree that 24+ hours is too long for an insert to
take place? If so, I would agree that 24 hours is way too long to insert a
*single* record. However, I've never tried inserting 77 million records so
I'm not sure if 24 hours is unreasonable for that. Maybe others who have
more experience with large databases can comment on that.

Or are you asking a question? If so, what exactly *is* your question? Are
you worried about why some records were missed or why the insert took so
long or both?

If you're trying to get some help with a problem, you'll have to give us
more information. At the very least, we need to know something about your
environment, like which version of MySQL you are using. We're likely to need
to know more as we start to understand the question, such as your table
definition(s), and we may need to see the code that is doing the inserts to
see if we can help you optimize it. We may even need information about your
hardware if this is a performance question to see whether the times you are
getting are reasonable for your hardware.

Rhino


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-19 Thread matt ryan
Andrew Pattison wrote:
I'm guessing that you have indexes on the 321st_stat table? If this is 
the case, try dropping them before you do the insert, then rebuilding 
them. MySQL is known to be slow at doing bulk inserts on indexed 
tables. Also, updates are much faster than inserts since with inserts 
there are much more disk IOs required.

Cheers
Table has a large primary key, to keep duplicates out, so I cant drop 
the index and remove.

Also, reindexing the table takes all day, dropping is not an option
Would I be better off doing an insert replace on all  17m new records, 
or only inserting the 500 new records using insert ignore ?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: 1 day 28 min insert

2004-08-19 Thread Dan Nelson
In the last episode (Aug 19), matt ryan said:
 One alternative, since you know you don't have many records to insert,
 is to pull the IDs of the missing records and insert just those.  Do an
 outer join on the two tables (joining on the primary key), get a list
 of the IDs of records in stat_in but not in 321st_stat, and add a
 WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT
 statement.  If you're running 4.1, you can use a subquery and embed the
 first query directly in the INSERT.
 
 Running 4.0.x
 
 something like, select a.*, b.* from a left outer join b on 
 a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into 
 temptable
 
 then insert from temptable into table a
 
 I think I tried this once, but it ran all day 

The select, or the insert?  If the select, run an EXPLAIN and see if
it's using the wrong index or no index.
 
 Is there a way to join on an index, instead of on each column?   The 
 primary key is 6-8 columns I forget

You have to list each column.  SHOW KEYS FROM table or SHOW CREATE
TABLE table will list all the keys and which fields are used in each
key.

-- 
Dan Nelson
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-19 Thread SGreen
It could save you lots of processing time to pre-collect a list of the 
records you are actually going to add to the table.  I know it sounds 
counter-intuitive but this has saved me OODLES of time on several 
occasions (of course it can't work for all situations but in your case 
where you are NOT inserting so many records I think it will help 
considerably).

CREATE TABLE tmpInsertMe (keyfield int);

SELECT s.keyfield
FROM stat_in s
LEFT JOIN 321st_stat st
on st.keyfield = s.keyfield
WHERE st.keyfield is null

This may take a while to process (depends on if stat_in is also indexed on 
its keyfield field) but I should take much less time than your 24 minute 
insert. Now add a key to tmpInsertMe to speed up the JOIN during the 
insert. On 1000+ records (based on your example statistics) this should 
take way under a second.

ALTER TABLE tmpInsertMe add Key(keyfield);

Then you insert to 321st_stat only those records that are new (1000+) so 
you end up with 1/77000th of the processing required to perform the actual 
insert as it doesn't need to check for the IGNORE case for all of the 
other records.

INSERT 321st_stat
SELECT s.*
FROM stat_IN s
INNER JOIN tmpInsertMe t
ON t.keyfield = s.keyfield;

And you are through with tmpInsertMe, so drop it:

DROP TABLE tmpInsertMe;

Of course, you will need to modify the field name and type of keyfield 
to match your situation. As I said, it is not exactly an intuitive 
technique but I tried it once out of desperation and I was HUGELY, 
INCREDIBLY IMPRESSED  with the performance improvement it created. At the 
time I was merging about 100,000 records into a 3,000,000 record table. 
Only about 1000 were new, and I had about 20,000 updates to perform, the 
rest were just old duplicates. Doing direct joins from source table to the 
destination table was killing me so I tried what I described and it worked 
very well

FWIW,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Andrew Pattison [EMAIL PROTECTED] wrote on 08/19/2004 12:24:18 
PM:

 I'm guessing that you have indexes on the 321st_stat table? If this is 
the 
 case, try dropping them before you do the insert, then rebuilding them. 
 MySQL is known to be slow at doing bulk inserts on indexed tables. Also, 

 updates are much faster than inserts since with inserts there are much 
more 
 disk IOs required.
 
 Cheers
 
 Andrew.
 - Original Message - 
 From: matt ryan [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 5:06 PM
 Subject: 1 day 28 min insert
 
 
 I think oracle parallel query is calling me
 
  110,832,565 stat records
 
  77,269,086 on weekly update, I get small daily files, but daily sql's 
dont 
  work very well, and miss records, in this case it missed 563 records.
 
  mysql update stat_in set ctasc='321ST';
  Query OK, 77269086 rows affected (24 min 17.60 sec)
  Rows matched: 77269086  Changed: 77269086  Warnings: 0
 
  mysql insert ignore into 321st_stat select * from stat_in;
  Query OK, 563 rows affected (1 day 28 min 35.95 sec)
  Records: 77269086  Duplicates: 77268523  Warnings: 0
 
  I just cant deal with speeds this slow, an insert onto a table with a 
  primary key that tosses out almost all records shouldnt take this long 
to 
  do
 
  -- 
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe: 
  http://lists.mysql.com/[EMAIL PROTECTED]
 
  
 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


Re: 1 day 28 min insert

2004-08-19 Thread SGreen
You are DEFINITELY not going to win any races with INSERT REPLACE. 
Minimizing the number of records you need to process will improve your 
times significantly, especially since you need less than .002% of your 
source data added to your destination table.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

matt ryan [EMAIL PROTECTED] wrote on 08/19/2004 12:42:10 PM:

 Andrew Pattison wrote:
 
  I'm guessing that you have indexes on the 321st_stat table? If this is 

  the case, try dropping them before you do the insert, then rebuilding 
  them. MySQL is known to be slow at doing bulk inserts on indexed 
  tables. Also, updates are much faster than inserts since with inserts 
  there are much more disk IOs required.
 
  Cheers
 
 Table has a large primary key, to keep duplicates out, so I cant drop 
 the index and remove.
 
 Also, reindexing the table takes all day, dropping is not an option
 
 Would I be better off doing an insert replace on all  17m new records, 
 or only inserting the 500 new records using insert ignore ?
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


RE: 1 day 28 min insert

2004-08-19 Thread Boyd E. Hemphill
Matt:

You would be very well served to give the table a unique physical key.
Alternate or primary, you could then join to this single column.  I am not
sure how big the table to get, however, so you will definitely want to use a
bigint unsigned if you plan to test this theory.  

Benefits:
-  You can remove the 6 to 8 columns of the PK in the table you are writing
to and replace it with a small integer column.  Over 17m inserts and
comparisons you can imagine the savings in disk IO.
-  Easier for the optimizer to choose the correct index.
-  More likely to get reads straight from the index.
-  Easier code to write/read/maintain

Also, piling the records to be inserted into a temporary table can be much
quicker.  This is my experience with tables of 2 to 3 million rows.  B/c
this operation may still be long I do not recommend the use of a temporary
table, but rather a regular table that you create and destroy as part of the
process.

Are you using MyIsam or InnoDB?  If MyIsam, then if you are scheduling the
insert as delayed and the table is still being read from, you may be
experiencing an issue where there are enough reads to keep the insert from
getting started.

Also if MyIsam, is your row pointer large enough?  If not this will slow you
down too.  

Have the tables been analyzed and checked lately?

Really, there are a number of server parameters to check.  Setting this will
be very unique to your situation.  Oracle is the same way, esp when
clustering.   Seems to me that spending $1500 or so on some MySQL consulting
would be much less expensive than an Oracle license, plus the cost of
porting, plus the consulting it would take to get your Oracle server tuned
for your app.  

Best Regards,
Boyd E. Hemphill
MySQL Certified Professional
[EMAIL PROTECTED]
Triand, Inc.
www.triand.com
O:  (512) 248-2287
M:  (713) 252-4688

-Original Message-
From: Dan Nelson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 19, 2004 10:48 AM
To: matt ryan
Cc: [EMAIL PROTECTED]
Subject: Re: 1 day 28 min insert

In the last episode (Aug 19), matt ryan said:
 One alternative, since you know you don't have many records to insert,
 is to pull the IDs of the missing records and insert just those.  Do an
 outer join on the two tables (joining on the primary key), get a list
 of the IDs of records in stat_in but not in 321st_stat, and add a
 WHERE id IN (list,of,ids) clause to the end of your INSERT ... SELECT
 statement.  If you're running 4.1, you can use a subquery and embed the
 first query directly in the INSERT.
 
 Running 4.0.x
 
 something like, select a.*, b.* from a left outer join b on 
 a.col1=b.col1, a.col2=b.col2, a.col3=b.col3 where b.col1 is null into 
 temptable
 
 then insert from temptable into table a
 
 I think I tried this once, but it ran all day 

The select, or the insert?  If the select, run an EXPLAIN and see if
it's using the wrong index or no index.
 
 Is there a way to join on an index, instead of on each column?   The 
 primary key is 6-8 columns I forget

You have to list each column.  SHOW KEYS FROM table or SHOW CREATE
TABLE table will list all the keys and which fields are used in each
key.

-- 
Dan Nelson
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 1 day 28 min insert

2004-08-19 Thread matt ryan
The table is 9 gig, and the index is 8 gig
unfortunately the primary unique key is almost every column, if I were 
to make it one using concat, it would be huge.

I tried making those fields a hash, but it did not work, I had duplicate 
hashes for non duplicate records!!

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]