Thanks for your response. >>> <[EMAIL PROTECTED]> 7/19/2004 11:47:39 AM >>> >It looks like your IN statement is forcing your inner SELECT to execute >once PER ROW of your main table. It is asking the engine to make sure that >_each and every_ id value in main meets the condition in the inner >select. So, for each and every value in the table main, it has to >re-computing the inner query and scan the results for matches.
Not sure why this would happen. The nested query is not correlated to the outer query, so I would expect it to be executed only once. I have tried the same query with even larger file sizes on other data managers and not had this problem. (In fact, I copied the query from an existing FoxPro program.) Also, in my production app, the actual queries being run ar much more complex, including multiple nested queries, and only with the having clause is there ever a problem. >I would change it to a JOIN against an anonymous view and test again - >SELECT m.* >FROM main m >INNER JOIN (SELECT main_ID > FROM receipt > GROUP BY main_ID > HAVING COUNT(1) > 5) as r >ON m.id = r.main_ID This query actually does run quickly. Thanks - I will try to work the syntax into my query generator. >- or to break it into two tables for some real speed - >CREATE TEMPORARY TABLE tmpR >SELECT main_ID >FROM receipt >GROUP BY main_ID >HAVING COUNT(1) > 5; >alter table tmpR add key(main_Id); >SELECT m.* >FROM main m >INNER JOIN tmpR r >on m.ID = r.main_ID; >DROP TABLE tmpR; This is actually the second scenario I had tried, as noted in my original post, and it does yeild better results than the nested query, but still takes an incredibly long time to run. >Either method should avoid the re-execution of the subselect for every row >in your primary table. Adding the index to the temporary table will make >the last select really fly. >Yours, >Shawn Green >Database Administrator >Unimin Corporation - Spruce Pine Thanks for the info, and for the query syntax to work around the problem. I still think this is a bug in processing the nested query, and if it is runing the subquery for each line in the master table, i think that is incorrect. - Leo Siefert "Leo Siefert" <[EMAIL PROTECTED]> wrote on 07/19/2004 11:22:39 AM: > OS: Win2k Server > MySQL: 4.1.1 alpha / 4.1.3 beta > Table type: INNO DB > > In my production environment, running the query: > > select * from main where id in > (select main_id from receipt group by main_id having COUNT(*) > 5) > > will hang the server - sometimes for over a day, thugh it seems it will > eventually complete working on it if given enough time. Currently main > contains ~200,000 records and receipt contains ~16,000. Main records > with any receipts have an average of ~10 receipts, but most have none. > > Created a smaller test database: > > master > id int(6) primary autoincrement > name varchar(25) (filled with random 10 char strings) > > detail > id int(6) primary autoincrement > master_id int(6) index (filled with random ints <= > max(master.id)) > detail varchar(25) (filled with random 10 char strings) > > temp > id int(6) index > > Fill master with 1,000 records, detail with 10,000. > Clone and fill master with 10,000 records, detail with 100,000. > > Query: > > select * from master where master.id in > (select master_id from detail group by master_id having COUNT(*) > 2) > > (small) returns 76 rows in 13 seconds. > (large) returns 496 rows in 566 seconds. (COUNT(*) > 15) > > Tried a two part query, sending the intermediate results to a temporary > table: > > create temporary table t select master_id from detail group by > master_id having COUNT(*) > 2; > select * from master where master.id in (select master_id from t); > drop table t; > > (small) returns 76 rows in 2.8 seconds. > (large) returns 496 rows in 17 seconds. > > Running the intermediate results into a permanent table: > > truncate table temp; > insert into temp select master_id from detail group by master_id having > COUNT(*) > 2; > select * from master where master.id in (select id from temp); > > (small) returns 76 rows in 0.16 seconds. > (large) returns 496 rows in 0.17 seconds. > > Have tried playing around with some of the system variables: > query_cache_size, innodb_buffer_pool_size with no real affect. > > In our production environment (record size is much larger, similar > number of records to the large test set), both the nested query and the > two-part query using a temporary query hang for indeterminate (> 6 hrs) > amounts of time, leaving the use of a permanent table as the only > option. Of course, the only real way to manage this is to create a > dedicated scratch table for each user of the system, a somewhat onerous > workaround. > > Anyone have an idea on a solution to this? Is there something in > setting up for INNO DB that I am missing, or should I file this as a > bug? > > Thanks. > > - Leo > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]