Re: Nested query with GROUP BY ... HAVING COUNT(*) ... hangs 4.1.3 beta

Leo Siefert Tue, 20 Jul 2004 11:23:53 -0700

Thanks for your response.

>>> <[EMAIL PROTECTED]> 7/19/2004 11:47:39 AM >>>
>It looks like your IN statement is forcing your inner SELECT to
execute 
>once PER ROW of your main table. It is asking the engine to make sure
that 
>_each and every_  id value in main meets the condition in the inner 
>select. So, for each and every value in the table main, it has to 
>re-computing the inner query and scan the results for matches.


Not sure why this would happen.  The nested query is not correlated to
the outer query, so I would expect it to be executed only once.  I have
tried the same query with even larger file sizes on other data managers
and not had this problem. (In fact, I copied the query from an existing
FoxPro program.)

Also, in my production app, the actual queries being run ar much more
complex, including multiple nested queries, and only with the having
clause is there ever a problem.

>I would change it to a JOIN against an anonymous view and test again
-

>SELECT m.*
>FROM main m
>INNER JOIN (SELECT main_ID
>        FROM receipt
>        GROUP BY main_ID
>        HAVING COUNT(1) > 5) as r
>ON m.id = r.main_ID

This query actually does run quickly.  Thanks - I will try to work the
syntax into my query generator.

>- or to break it into two tables for some real speed -

>CREATE TEMPORARY TABLE tmpR
>SELECT main_ID
>FROM receipt
>GROUP BY main_ID
>HAVING COUNT(1) > 5;

>alter table tmpR add key(main_Id);

>SELECT m.*
>FROM main m
>INNER JOIN tmpR r
>on m.ID = r.main_ID;

>DROP TABLE tmpR;

This is actually the second scenario I had tried, as noted in my
original post, and it does yeild better results than the nested query,
but still takes an incredibly long time to run.

>Either method should avoid the re-execution of the subselect for every
row 
>in your primary table. Adding the index to the temporary table will
make 
>the last select really fly.

>Yours,
>Shawn Green
>Database Administrator
>Unimin Corporation - Spruce Pine


Thanks for the info, and for the query syntax to work around the
problem.  I still think this is a bug in processing the nested query,
and if it is runing the subquery for each line in the master table, i
think that is incorrect.

 - Leo Siefert







"Leo Siefert" <[EMAIL PROTECTED]> wrote on 07/19/2004
11:22:39 
AM:

> OS: Win2k Server
> MySQL: 4.1.1 alpha / 4.1.3 beta
>    Table type: INNO DB
> 
> In my production environment, running the query:
> 
> select * from main where id in
> (select main_id from receipt group by main_id having COUNT(*) > 5)
> 
> will hang the server - sometimes for over a day, thugh it seems it
will
> eventually complete working on it if given enough time. Currently
main
> contains ~200,000 records and receipt contains ~16,000. Main records
> with any receipts have an average of ~10 receipts, but most have
none. 
> 
> Created a smaller test database:
> 
> master
>    id   int(6)  primary  autoincrement
>    name  varchar(25)  (filled with random 10 char strings)
> 
> detail
>    id   int(6)  primary  autoincrement
>    master_id   int(6)  index  (filled with random ints <=
> max(master.id))
>    detail  varchar(25)  (filled with random 10 char strings)
> 
> temp
>    id int(6) index
> 
> Fill master with 1,000 records, detail with 10,000.
> Clone and fill master with 10,000 records, detail with 100,000.
> 
> Query:
> 
> select * from master where master.id in
> (select master_id from detail group by master_id having COUNT(*) >
2)
> 
> (small) returns 76 rows in 13 seconds.
> (large) returns 496 rows in 566 seconds. (COUNT(*) > 15)
> 
> Tried a two part query, sending the intermediate results to a
temporary
> table:
> 
> create temporary table t select master_id from detail group by
> master_id having COUNT(*) > 2;
> select * from master where master.id in (select master_id from t);
> drop table t;
> 
> (small) returns 76 rows in 2.8 seconds.
> (large) returns 496 rows in 17 seconds.
> 
> Running the intermediate results into a permanent table:
> 
> truncate table temp;
> insert into temp select master_id from detail group by master_id
having
> COUNT(*) > 2;
> select * from master where master.id in (select id from temp);
> 
> (small) returns 76 rows in 0.16 seconds.
> (large) returns 496 rows in 0.17 seconds.
> 
> Have tried playing around with some of the system variables:
> query_cache_size, innodb_buffer_pool_size with no real affect.
> 
> In our production environment (record size is much larger, similar
> number of records to the large test set), both the nested query and
the
> two-part query using a temporary query hang for indeterminate (> 6
hrs)
> amounts of time, leaving the use of a permanent table as the only
> option. Of course, the only real way to manage this is to create a
> dedicated scratch table for each user of the system, a somewhat
onerous
> workaround.
> 
> Anyone have an idea on a solution to this? Is there something in
> setting up for INNO DB that I am missing, or should I file this as a
> bug?
> 
> Thanks.
> 
>  - Leo
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql 
> To unsubscribe:   
http://lists.mysql.com/[EMAIL PROTECTED] 
> 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Nested query with GROUP BY ... HAVING COUNT(*) ... hangs 4.1.3 beta

Reply via email to