SLOW 22million rows, 5 hour query?

2004-04-24 Thread lists
I have a table with 22,371,273 rows, current type is MyISAM.  I have one
query tha took nearly 5 hours to complete.  I do not know if this like
it should be or not, but I thought I'd ask the list.  My gut feeling is
the `gropu by` that's slowing it down but nonetheless 5 hours seems 
excessive.  I'm trying to find the source of the limitation  and work
from there.

Mysql:
4.0.18-standard, precompiled 32-bit sparc 2.8

Server:
Sun 420
Solaris 2.8
4x450MHZ Ultrasparc-II
4GB Ram
Two 50gb mounts, fiber channel scsi to EMC.
Brand new systems, totally idle.

I think everything relevant is here.  The skinny: full table scan on 
22 million rows, group and insert into new table.  MyISAM and InnoDB 
appear to give similar results when used as the destination
('new_table').

insert into new_table
select month_day, floor(bucket/3) as bucket, date, src, avg(value) as
value
from source_table
group by month_day, bucket, src;

Relevant `explain` details:
Full table scan: 22,371,273 rows, Using temporary; Using filesort

Query OK, 11495208 rows affected (4 hours 47 min 21.01 sec)
Records: 11495208  Duplicates: 0  Warnings: 0

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Jigal van Hemert
 insert into new_table
 select month_day, floor(bucket/3) as bucket, date, src, avg(value) as
 value
 from source_table
 group by month_day, bucket, src;

 Relevant `explain` details:
 Full table scan: 22,371,273 rows, Using temporary; Using filesort

 Query OK, 11495208 rows affected (4 hours 47 min 21.01 sec)

IMHO two things slow you down:
1) floor(bucket/3) as bucket. This means that MySQL has to calculate this
value for each record to be able to use it as a grouping value. If you
always use this same expression for grouping, it might be useful to have a
column with the precalculated value.
2) there's only one table involved, so only one index will be used. If you
had an index that contains month_day, bucket and src it would speed up
things perhaps.

Furthermore, it depends on the speed of the disks, your configuration,
memory configuration and use how fast it will go. But explain indicates that
MySQL needs to copy the data into a temporary table and use filesort to
order and group things.

Maybe it will help to surpress the automatic sorting that is done by MySQL
because of the GROUP BY, by adding ORDER BY NULL?

Regards, Jigal.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Tim Cutts
On 23 Apr 2004, at 9:48 pm, [EMAIL PROTECTED] wrote:

Relevant `explain` details:
Full table scan: 22,371,273 rows, Using temporary; Using filesort
The filesort is a giveaway.  Can you increase the sort buffer size so 
that the sort can happen in memory rather than having to use a file to 
sort?

Tim

--
Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute
Hinxton, Cambridge, CB10 1SA, UK
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Donny Simonton
Use insert delayed, and you will cut your time in half.  At least with my
experience.  But also how long does it actually take to run the query
itself.  Giving a summary explain doesn't help much.  You really need a
table structure that the select is using and a full explain.

Donny

 -Original Message-
 From: Tim Cutts [mailto:[EMAIL PROTECTED]
 Sent: Saturday, April 24, 2004 6:02 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: SLOW 22million rows, 5 hour query?
 
 
 On 23 Apr 2004, at 9:48 pm, [EMAIL PROTECTED] wrote:
 
  Relevant `explain` details:
  Full table scan: 22,371,273 rows, Using temporary; Using filesort
 
 The filesort is a giveaway.  Can you increase the sort buffer size so
 that the sort can happen in memory rather than having to use a file to
 sort?
 
 Tim
 
 --
 Dr Tim Cutts
 Informatics Systems Group
 Wellcome Trust Sanger Institute
 Hinxton, Cambridge, CB10 1SA, UK
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Cliff Daniel
I must have done a poor job at explaining this.  I already have all
those primary keys, etc.  If you are required to do a full table scan on 
the table, as I am in my example, please explain to me how an index even 
matters in that case.  There is _no_ where clause in the query so how
will an index help? :)

Now I have been benchmarking creating a CKEY combined key column that
just basically is a concat(month_day, ',', bucket, ',', src).  I made
that a char(44) column and make that the primary key.  Things seem faster 
due to use only one row for the primary key instead of 3, as well as a less
complex group by.  Preliminary results show the CKEY to be 50% faster
on the particular query I'm using.

1) Yes, it does have to calculate floor(bucket / 3) but that is fairly
in expensive call on a per-row basis, one would think.

2) I already have that index.  It's a Primary Key(month_day, src,
bucket).  I'm still confused on how the index will speed it up
on the source table side.

We are NOT I/O bound.  Looks more like cpu bound to me.  Mysql uses 25%
cpu on the solaris which is 1 entire cpu on a 4 processor machine.

I'll give the order by NULL a shot
Cliff

Jigal van Hemert [EMAIL PROTECTED] writes:
  insert into new_table
  select month_day, floor(bucket/3) as bucket, date, src, avg(value) as
  value
  from source_table
  group by month_day, bucket, src;
 
  Relevant `explain` details:
  Full table scan: 22,371,273 rows, Using temporary; Using filesort
 
  Query OK, 11495208 rows affected (4 hours 47 min 21.01 sec)
 
 IMHO two things slow you down:
 1) floor(bucket/3) as bucket. This means that MySQL has to calculate this
 value for each record to be able to use it as a grouping value. If you
 always use this same expression for grouping, it might be useful to have a
 column with the precalculated value.
 2) there's only one table involved, so only one index will be used. If you
 had an index that contains month_day, bucket and src it would speed up
 things perhaps.
 
 Furthermore, it depends on the speed of the disks, your configuration,
 memory configuration and use how fast it will go. But explain indicates that
 MySQL needs to copy the data into a temporary table and use filesort to
 order and group things.
 
 Maybe it will help to surpress the automatic sorting that is done by MySQL
 because of the GROUP BY, by adding ORDER BY NULL?
 
 Regards, Jigal.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Cliff Daniel
http://dev.mysql.com/doc/mysql/en/INSERT_SELECT.html

 Do not use DELAYED with INSERT ... SELECT. 


With respect to the table structure...can you explain how when you have
to read every single row regardless how the structure (assuming you are
going down the path of idexes) affects the query?

Cliff

Donny Simonton [EMAIL PROTECTED] writes:
 Use insert delayed, and you will cut your time in half.  At least with my
 experience.  But also how long does it actually take to run the query
 itself.  Giving a summary explain doesn't help much.  You really need a
 table structure that the select is using and a full explain.
 
 Donny
 
  -Original Message-
  From: Tim Cutts [mailto:[EMAIL PROTECTED]
  Sent: Saturday, April 24, 2004 6:02 AM
  To: [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Subject: Re: SLOW 22million rows, 5 hour query?
  
  
  On 23 Apr 2004, at 9:48 pm, [EMAIL PROTECTED] wrote:
  
   Relevant `explain` details:
   Full table scan: 22,371,273 rows, Using temporary; Using filesort
  
  The filesort is a giveaway.  Can you increase the sort buffer size so
  that the sort can happen in memory rather than having to use a file to
  sort?
  
  Tim
  
  --
  Dr Tim Cutts
  Informatics Systems Group
  Wellcome Trust Sanger Institute
  Hinxton, Cambridge, CB10 1SA, UK
  
  
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  http://lists.mysql.com/[EMAIL PROTECTED]
 
 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: SLOW 22million rows, 5 hour query?

2004-04-24 Thread Donny Simonton
Cliff, still no explain still not table structure.  Until that happens enjoy
the 5 hour club.

Donny

 -Original Message-
 From: Cliff Daniel [mailto:[EMAIL PROTECTED]
 Sent: Saturday, April 24, 2004 6:41 PM
 To: Donny Simonton
 Cc: 'Tim Cutts'; [EMAIL PROTECTED]
 Subject: Re: SLOW 22million rows, 5 hour query?
 
 http://dev.mysql.com/doc/mysql/en/INSERT_SELECT.html
 
  Do not use DELAYED with INSERT ... SELECT. 
 
 
 With respect to the table structure...can you explain how when you have
 to read every single row regardless how the structure (assuming you are
 going down the path of idexes) affects the query?
 
 Cliff
 
 Donny Simonton [EMAIL PROTECTED] writes:
  Use insert delayed, and you will cut your time in half.  At least with
 my
  experience.  But also how long does it actually take to run the query
  itself.  Giving a summary explain doesn't help much.  You really need a
  table structure that the select is using and a full explain.
 
  Donny
 
   -Original Message-
   From: Tim Cutts [mailto:[EMAIL PROTECTED]
   Sent: Saturday, April 24, 2004 6:02 AM
   To: [EMAIL PROTECTED]
   Cc: [EMAIL PROTECTED]
   Subject: Re: SLOW 22million rows, 5 hour query?
  
  
   On 23 Apr 2004, at 9:48 pm, [EMAIL PROTECTED] wrote:
  
Relevant `explain` details:
Full table scan: 22,371,273 rows, Using temporary; Using filesort
  
   The filesort is a giveaway.  Can you increase the sort buffer size so
   that the sort can happen in memory rather than having to use a file to
   sort?
  
   Tim
  
   --
   Dr Tim Cutts
   Informatics Systems Group
   Wellcome Trust Sanger Institute
   Hinxton, Cambridge, CB10 1SA, UK
  
  
   --
   MySQL General Mailing List
   For list archives: http://lists.mysql.com/mysql
   To unsubscribe:
   http://lists.mysql.com/[EMAIL PROTECTED]
 
 
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]