Re: Why does DISTINCT take so long time ??
Hi! Sinisa == Sinisa Milivojevic [EMAIL PROTECTED] writes: Sinisa Fournier Jocelyn [Presence-PC] writes: Hi, Yes it is 4.0.1. But the first query has also to make a temporary table with 2 million rows, it's why I don't understand the delta between the query without DISTINCT and the query with DISTINCT. The remove duplicates doesn't occurs after the join was performed ?? (it should be really fast in this case) Regards, Jocelyn Sinisa DISTINCT simply has to re-iterate. I will try to explain what happens in this case: mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | MySQL has 4 different way to optimize DISTINCT queries. This is the worst case for the distinct optimization for MySQL and we really have to fix this in the near future. The worst case is when: - You are doing a DISTINCT on a GROUP BY and the result set doesn't fit into a HEAP temporary table (tmp_table_size) and the result set is larger than sort_buffer_size. In this case MySQL does the following loop: - Iterate through all rows in - Iterate through all rows after the current one in the temporary table - Delete the row in the inner iteration if the rows are the same - Sort rows - Send them to the client. (Yes, I know that his is a hack, but this is not a common query...) The right way to fix this would be to create an unique index over the result rows, and let the REPAIR code regenerate the index, removing duplicate rows as it finds them. We will put this in our TODO to get this fixed in the near future. Regards, Monty - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Hi, Surely, the having clause is not redundant ( I misread it as 0). Is it not the compound index (on mot,date,numresponse) which would make the query fast instead of three single column indexes? Thanks, Anvar. At 01:40 PM 19/01/2002 -0500, you wrote: Anvar had some very good explanations about the time it takes to run the queries. ##Here are some work arounds: ##If you need to have these columns (mot, date, numresponse) in the group by clause, ##try putting an index on each of them to speed it up. mysqlalter table searchhardwarefr3 add index idx_mot(mot); ##... etc. ##this should speed up the 1st query for sure. ##if the second query is still slow, (i'm not sure about the exact details of mysql, so this might or might not make a difference) ## put the results from your first query into a temporary table (mytemp): mysqlcreate temporary table mytemp SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; ## then explicitly index both columns mysql alter table mytemp add index idx_count(count); mysql alter table mytemp add index idx_numresponse(numresponse) ##and then run the following query mysqlselect distinct count, numresponse from temp (mytemp) ##by the way, I don't think the HAVING clause is redundant. Good Luck. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Hi, In fact I have a compound index on (mot,date,numreponse,topic), so I assume (mot,date,numreponse) is also optimised if I correctly understand how indexs work ;) Anyway, I just wanted to run this query once, so it's not really my priority to make speed faster on this particular query (moreover putting a specific index on a table with more than 2M table will take a big amount of time, probably as long as my query without DISTINCT ;)) Jocelyn - Original Message - From: Anvar Hussain K.M. [EMAIL PROTECTED] To: Kalok Lo [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, January 21, 2002 5:57 AM Subject: Re: Why does DISTINCT take so long time ?? Hi, Surely, the having clause is not redundant ( I misread it as 0). Is it not the compound index (on mot,date,numresponse) which would make the query fast instead of three single column indexes? Thanks, Anvar. At 01:40 PM 19/01/2002 -0500, you wrote: Anvar had some very good explanations about the time it takes to run the queries. ##Here are some work arounds: ##If you need to have these columns (mot, date, numresponse) in the group by clause, ##try putting an index on each of them to speed it up. mysqlalter table searchhardwarefr3 add index idx_mot(mot); ##... etc. ##this should speed up the 1st query for sure. ##if the second query is still slow, (i'm not sure about the exact details of mysql, so this might or might not make a difference) ## put the results from your first query into a temporary table (mytemp): mysqlcreate temporary table mytemp SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; ## then explicitly index both columns mysql alter table mytemp add index idx_count(count); mysql alter table mytemp add index idx_numresponse(numresponse) ##and then run the following query mysqlselect distinct count, numresponse from temp (mytemp) ##by the way, I don't think the HAVING clause is redundant. Good Luck. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Anvar had some very good explanations about the time it takes to run the queries. ##Here are some work arounds: ##If you need to have these columns (mot, date, numresponse) in the group by clause, ##try putting an index on each of them to speed it up. mysqlalter table searchhardwarefr3 add index idx_mot(mot); ##... etc. ##this should speed up the 1st query for sure. ##if the second query is still slow, (i'm not sure about the exact details of mysql, so this might or might not make a difference) ## put the results from your first query into a temporary table (mytemp): mysqlcreate temporary table mytemp SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; ## then explicitly index both columns mysql alter table mytemp add index idx_count(count); mysql alter table mytemp add index idx_numresponse(numresponse) ##and then run the following query mysqlselect distinct count, numresponse from temp (mytemp) ##by the way, I don't think the HAVING clause is redundant. Good Luck. - Original Message - From: Anvar Hussain K.M. [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, January 19, 2002 2:27 AM Subject: Re: Why does DISTINCT take so long time ?? Hi, Yes, the reason for the time difference is that for distinct query, as Sinisa noted, it has to reiterate. For the output to generate, first the rows have to be ordered ( in this case since count(*) is given every column should be present in the comparison.) using a temp table (or any other mechanism to keep rows ordered). For the first query also there should be an intermediate temp table to order rows but here it is only necessary to consider the columns in the group by clause. Considering these, a rough and primitive estimate of the time taken to execute the query can be found. Suppose the average length of a row is 300 bytes and the three columns in the group by clause takes 30 bytes average. Then if the first query takes 15 minutes, the second query will take 150 minutes. This may not be the real scenario with mysql but some thing similar. The having clause I feel, is redundant. the 15 min for the first qurey seem too much, perhaps indexing might help. Hope somebody else has a better explanation. Anvar. At 02:41 AM 18/01/2002 +0100, you wrote: Hi, I've notice sometimes DISTINCT clause take a really high amount of time to remove duplicates whereas it should be really quick (I assume it should be ;)) My first query is : mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; it returns : +---++ | count | numreponse | +---++ | 2 | 111239 | | 2 | 108183 | | 2 | 73 | | 2 | 111383 | cut | 2 | 111239 | | 2 | 111760 | | 3 | 109166 | | 2 | 09 | | 3 | 109166 | +---++ 58 rows in set (14 min 51.15 sec) My second query is : mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; Well I'm not enough patient to wait, but when I stop the querie, it has been running for more than 3500 seconds... (and more than 45mn in 'Removing duplicates' state...) mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+--- - -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+--- - -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+--- - -+--+ 1 row in set (0.00 sec) mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+--- - -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+--- - -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+--- - -+--+ 1 row in set (0.00 sec) Why does it take so much time to remove duplicates in only 58 rows ?? Thank you :) Regards
Re: Why does DISTINCT take so long time ??
Fournier Jocelyn [Presence-PC] writes: Hi, I've notice sometimes DISTINCT clause take a really high amount of time to remove duplicates whereas it should be really quick (I assume it should be ;)) [skip] Why does it take so much time to remove duplicates in only 58 rows ?? Thank you :) Regards, Jocelyn Fournier Presence-PC Hi! Because it has to make first a temporary table with 2 million rows. And is it 4.0.1 ? -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Fulltime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Hi, Yes it is 4.0.1. But the first query has also to make a temporary table with 2 million rows, it's why I don't understand the delta between the query without DISTINCT and the query with DISTINCT. The remove duplicates doesn't occurs after the join was performed ?? (it should be really fast in this case) Regards, Jocelyn - Original Message - From: Sinisa Milivojevic [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, January 18, 2002 3:57 PM Subject: Re: Why does DISTINCT take so long time ?? Fournier Jocelyn [Presence-PC] writes: Hi, I've notice sometimes DISTINCT clause take a really high amount of time to remove duplicates whereas it should be really quick (I assume it should be ;)) [skip] Why does it take so much time to remove duplicates in only 58 rows ?? Thank you :) Regards, Jocelyn Fournier Presence-PC Hi! Because it has to make first a temporary table with 2 million rows. And is it 4.0.1 ? -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Fulltime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Fournier Jocelyn [Presence-PC] writes: Hi, Yes it is 4.0.1. But the first query has also to make a temporary table with 2 million rows, it's why I don't understand the delta between the query without DISTINCT and the query with DISTINCT. The remove duplicates doesn't occurs after the join was performed ?? (it should be really fast in this case) Regards, Jocelyn DISTINCT simply has to re-iterate. -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Fulltime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
I understand it has to re-iterate if the number of rows in the result without DISTINCT is greater than the limit clause, but if the result without DISTINCT is lower, it should be faster to perform the DISTINCT on the result directly (or perhaps I'm missing something ? ;)). - Original Message - From: Sinisa Milivojevic [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, January 18, 2002 4:27 PM Subject: Re: Why does DISTINCT take so long time ?? Fournier Jocelyn [Presence-PC] writes: Hi, Yes it is 4.0.1. But the first query has also to make a temporary table with 2 million rows, it's why I don't understand the delta between the query without DISTINCT and the query with DISTINCT. The remove duplicates doesn't occurs after the join was performed ?? (it should be really fast in this case) Regards, Jocelyn DISTINCT simply has to re-iterate. -- Regards, __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Fulltime Developer /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Why does DISTINCT take so long time ??
Hi, Yes, the reason for the time difference is that for distinct query, as Sinisa noted, it has to reiterate. For the output to generate, first the rows have to be ordered ( in this case since count(*) is given every column should be present in the comparison.) using a temp table (or any other mechanism to keep rows ordered). For the first query also there should be an intermediate temp table to order rows but here it is only necessary to consider the columns in the group by clause. Considering these, a rough and primitive estimate of the time taken to execute the query can be found. Suppose the average length of a row is 300 bytes and the three columns in the group by clause takes 30 bytes average. Then if the first query takes 15 minutes, the second query will take 150 minutes. This may not be the real scenario with mysql but some thing similar. The having clause I feel, is redundant. the 15 min for the first qurey seem too much, perhaps indexing might help. Hope somebody else has a better explanation. Anvar. At 02:41 AM 18/01/2002 +0100, you wrote: Hi, I've notice sometimes DISTINCT clause take a really high amount of time to remove duplicates whereas it should be really quick (I assume it should be ;)) My first query is : mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; it returns : +---++ | count | numreponse | +---++ | 2 | 111239 | | 2 | 108183 | | 2 | 73 | | 2 | 111383 | cut | 2 | 111239 | | 2 | 111760 | | 3 | 109166 | | 2 | 09 | | 3 | 109166 | +---++ 58 rows in set (14 min 51.15 sec) My second query is : mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; Well I'm not enough patient to wait, but when I stop the querie, it has been running for more than 3500 seconds... (and more than 45mn in 'Removing duplicates' state...) mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+ -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+ -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+ -+--+ 1 row in set (0.00 sec) mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+ -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+ -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+ -+--+ 1 row in set (0.00 sec) Why does it take so much time to remove duplicates in only 58 rows ?? Thank you :) Regards, Jocelyn Fournier Presence-PC - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Why does DISTINCT take so long time ??
Hi, I've notice sometimes DISTINCT clause take a really high amount of time to remove duplicates whereas it should be really quick (I assume it should be ;)) My first query is : mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; it returns : +---++ | count | numreponse | +---++ | 2 | 111239 | | 2 | 108183 | | 2 | 73 | | 2 | 111383 | cut | 2 | 111239 | | 2 | 111760 | | 3 | 109166 | | 2 | 09 | | 3 | 109166 | +---++ 58 rows in set (14 min 51.15 sec) My second query is : mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; Well I'm not enough patient to wait, but when I stop the querie, it has been running for more than 3500 seconds... (and more than 45mn in 'Removing duplicates' state...) mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+ -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+ -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+ -+--+ 1 row in set (0.00 sec) mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100; +---+---+---+-+-+--+ -+--+ | table | type | possible_keys | key | key_len | ref | rows| Extra| +---+---+---+-+-+--+ -+--+ | searchhardwarefr3 | index | NULL | PRIMARY | 75 | NULL | 2026032 | Using index; Using temporary | +---+---+---+-+-+--+ -+--+ 1 row in set (0.00 sec) Why does it take so much time to remove duplicates in only 58 rows ?? Thank you :) Regards, Jocelyn Fournier Presence-PC - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php