Re: Why does DISTINCT take so long time ??

2002-01-26 Thread Michael Widenius


Hi!

 Sinisa == Sinisa Milivojevic [EMAIL PROTECTED] writes:

Sinisa Fournier Jocelyn [Presence-PC] writes:
 Hi,
 
 Yes it is 4.0.1.
 But the first query has also to make a temporary table with 2 million rows,
 it's why I don't understand the delta between the query without DISTINCT and
 the query with DISTINCT.
 The remove duplicates doesn't occurs after the join was performed ?? (it
 should be really fast in this case)
 
 Regards,
 
 Jocelyn

Sinisa DISTINCT simply has to re-iterate.

I will try to explain what happens in this case:

mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;

| searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
2026032 | Using index; Using temporary |

MySQL has 4 different way to optimize DISTINCT queries.

This is the worst case for the distinct optimization for MySQL and we
really have to fix this in the near future. The worst case is when:

- You are doing a DISTINCT on a GROUP BY and the result set doesn't
  fit into a HEAP temporary table (tmp_table_size) and the result
  set is larger than sort_buffer_size.

In this case MySQL does the following loop:

- Iterate through all rows in 
  - Iterate through all rows after the current one in the temporary table
- Delete the row in the inner iteration if the rows are the same
- Sort rows
- Send them to the client.

(Yes, I know that his is a hack, but this is not a common query...)

The right way to fix this would be to create an unique index over the
result rows, and let the REPAIR code regenerate the index, removing
duplicate rows as it finds them.

We will put this in our TODO to get this fixed in the near future.

Regards,
Monty

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-20 Thread Anvar Hussain K.M.

Hi,

Surely, the having clause is not redundant ( I misread it as  0).

Is it not the compound index (on mot,date,numresponse) which would
make the query fast instead of three single column indexes?

Thanks,
Anvar.

At 01:40 PM 19/01/2002 -0500, you wrote:
Anvar had some very good explanations about the time it takes to run the
queries.

##Here are some work arounds:
##If you need to have these columns (mot, date, numresponse) in the group by
clause,
##try putting an index on each of them to speed it up.

mysqlalter table searchhardwarefr3
 add index idx_mot(mot);

##... etc.

##this should speed up the 1st query for sure.
##if the second query is still slow, (i'm not sure about the exact details
of mysql, so this might or might not make a difference)

## put the results from your first query into a temporary table (mytemp):
mysqlcreate temporary table mytemp
  SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3
 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
## then explicitly index both columns
mysql alter table mytemp
  add index idx_count(count);
mysql alter table mytemp
  add index idx_numresponse(numresponse)
##and then run the following query

mysqlselect distinct count, numresponse from temp (mytemp)

##by the way, I don't think the HAVING clause is redundant.


Good Luck.



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-20 Thread Jocelyn Fournier

Hi,

In fact I have a compound index on (mot,date,numreponse,topic), so I assume
(mot,date,numreponse) is also optimised if I correctly understand how indexs
work ;)
Anyway, I just wanted to run this query once, so it's not really my priority
to make speed faster on this particular query (moreover putting a specific
index on a table with more than 2M table will take a big amount of time,
probably as long as my query without DISTINCT ;))

Jocelyn
- Original Message -
From: Anvar Hussain K.M. [EMAIL PROTECTED]
To: Kalok Lo [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, January 21, 2002 5:57 AM
Subject: Re: Why does DISTINCT take so long time ??


 Hi,

 Surely, the having clause is not redundant ( I misread it as  0).

 Is it not the compound index (on mot,date,numresponse) which would
 make the query fast instead of three single column indexes?

 Thanks,
 Anvar.

 At 01:40 PM 19/01/2002 -0500, you wrote:
 Anvar had some very good explanations about the time it takes to run the
 queries.
 
 ##Here are some work arounds:
 ##If you need to have these columns (mot, date, numresponse) in the group
by
 clause,
 ##try putting an index on each of them to speed it up.
 
 mysqlalter table searchhardwarefr3
  add index idx_mot(mot);
 
 ##... etc.
 
 ##this should speed up the 1st query for sure.
 ##if the second query is still slow, (i'm not sure about the exact
details
 of mysql, so this might or might not make a difference)
 
 ## put the results from your first query into a temporary table (mytemp):
 mysqlcreate temporary table mytemp
   SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3
  GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
 ## then explicitly index both columns
 mysql alter table mytemp
   add index idx_count(count);
 mysql alter table mytemp
   add index idx_numresponse(numresponse)
 ##and then run the following query
 
 mysqlselect distinct count, numresponse from temp (mytemp)
 
 ##by the way, I don't think the HAVING clause is redundant.
 
 
 Good Luck.



 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail mysql-unsubscribe-##L=##[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-19 Thread Kalok Lo

Anvar had some very good explanations about the time it takes to run the
queries.

##Here are some work arounds:
##If you need to have these columns (mot, date, numresponse) in the group by
clause,
##try putting an index on each of them to speed it up.

mysqlalter table searchhardwarefr3
add index idx_mot(mot);

##... etc.

##this should speed up the 1st query for sure.
##if the second query is still slow, (i'm not sure about the exact details
of mysql, so this might or might not make a difference)

## put the results from your first query into a temporary table (mytemp):
mysqlcreate temporary table mytemp
 SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
## then explicitly index both columns
mysql alter table mytemp
 add index idx_count(count);
mysql alter table mytemp
 add index idx_numresponse(numresponse)
##and then run the following query

mysqlselect distinct count, numresponse from temp (mytemp)

##by the way, I don't think the HAVING clause is redundant.


Good Luck.


- Original Message -
From: Anvar Hussain K.M. [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, January 19, 2002 2:27 AM
Subject: Re: Why does DISTINCT take so long time ??


 Hi,

 Yes, the reason for the time difference is that for distinct query, as
 Sinisa noted, it has to reiterate.

 For the output to generate, first the rows have to be ordered ( in this
 case since count(*) is given every column
 should be present in the comparison.) using a temp table (or any other
 mechanism to keep rows ordered).

 For the first query also there should be an intermediate temp table to
 order rows but here it is only necessary to
 consider the columns in the group by clause.

 Considering these, a rough and primitive estimate of the time taken to
 execute the query can be found.

 Suppose the average length of a row is 300 bytes and the three columns in
 the group by clause takes
 30 bytes average.  Then if the first query takes 15 minutes, the second
 query will take 150 minutes.


 This may not be the real scenario with mysql but some thing similar.
 The having clause I feel, is redundant.  the 15 min for the first qurey
 seem too much, perhaps indexing might
 help.

 Hope somebody else  has a better explanation.
 Anvar.

 At 02:41 AM 18/01/2002 +0100, you wrote:
 Hi,
 
 I've notice sometimes DISTINCT clause take a really high amount of time
to
 remove duplicates whereas it should be really quick (I assume it should
be
 ;))
 
 My first query is :
 
 mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP
BY
 mot,date,numreponse HAVING count1 LIMIT 100;
 
 it returns :
 
 +---++
 | count | numreponse |
 +---++
 | 2 | 111239 |
 | 2 | 108183 |
 | 2 | 73 |
 | 2 | 111383 |
 cut
 | 2 | 111239 |
 | 2 | 111760 |
 | 3 | 109166 |
 | 2 | 09 |
 | 3 | 109166 |
 +---++
 58 rows in set (14 min 51.15 sec)
 
 My second query is :
 
 
 mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM
searchhardwarefr3
 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
 
 Well I'm not enough patient to wait, but when I stop the querie, it has
been
 running for more than 3500 seconds... (and more than 45mn in 'Removing
 duplicates' state...)
 
 mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM
 searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;

+---+---+---+-+-+--+---
-
 -+--+
 | table | type  | possible_keys | key | key_len | ref  |
 rows| Extra|

+---+---+---+-+-+--+---
-
 -+--+
 | searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
 2026032 | Using index; Using temporary |

+---+---+---+-+-+--+---
-
 -+--+
 1 row in set (0.00 sec)
 
 mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM
searchhardwarefr3
 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;

+---+---+---+-+-+--+---
-
 -+--+
 | table | type  | possible_keys | key | key_len | ref  |
 rows| Extra|

+---+---+---+-+-+--+---
-
 -+--+
 | searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
 2026032 | Using index; Using temporary |

+---+---+---+-+-+--+---
-
 -+--+
 1 row in set (0.00 sec)
 
 
 Why does it take so much time to remove duplicates in only 58 rows ??
 
 Thank you :)
 
 Regards

Re: Why does DISTINCT take so long time ??

2002-01-18 Thread Sinisa Milivojevic

Fournier Jocelyn [Presence-PC] writes:
 Hi,
 
 I've notice sometimes DISTINCT clause take a really high amount of time to
 remove duplicates whereas it should be really quick (I assume it should be
 ;))


[skip]
 
 
 Why does it take so much time to remove duplicates in only 58 rows ??
 
 Thank you :)
 
 Regards,
 
 Jocelyn Fournier
 Presence-PC
 


Hi!

Because it has to make first a temporary table with 2 million rows.

And is it 4.0.1 ?

-- 
Regards,
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Fulltime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   ___/   www.mysql.com


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-18 Thread Fournier Jocelyn [Presence-PC]

Hi,

Yes it is 4.0.1.
But the first query has also to make a temporary table with 2 million rows,
it's why I don't understand the delta between the query without DISTINCT and
the query with DISTINCT.
The remove duplicates doesn't occurs after the join was performed ?? (it
should be really fast in this case)

Regards,

Jocelyn
- Original Message -
From: Sinisa Milivojevic [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, January 18, 2002 3:57 PM
Subject: Re: Why does DISTINCT take so long time ??


 Fournier Jocelyn [Presence-PC] writes:
  Hi,
 
  I've notice sometimes DISTINCT clause take a really high amount of time
to
  remove duplicates whereas it should be really quick (I assume it should
be
  ;))
 

 [skip]

 
  Why does it take so much time to remove duplicates in only 58 rows ??
 
  Thank you :)
 
  Regards,
 
  Jocelyn Fournier
  Presence-PC
 


 Hi!

 Because it has to make first a temporary table with 2 million rows.

 And is it 4.0.1 ?

 --
 Regards,
__  ___ ___   __
   /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED]
  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Fulltime Developer
 /_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
___/   www.mysql.com




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-18 Thread Sinisa Milivojevic

Fournier Jocelyn [Presence-PC] writes:
 Hi,
 
 Yes it is 4.0.1.
 But the first query has also to make a temporary table with 2 million rows,
 it's why I don't understand the delta between the query without DISTINCT and
 the query with DISTINCT.
 The remove duplicates doesn't occurs after the join was performed ?? (it
 should be really fast in this case)
 
 Regards,
 
 Jocelyn

DISTINCT simply has to re-iterate.

-- 
Regards,
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Fulltime Developer
/_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
   ___/   www.mysql.com


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-18 Thread Fournier Jocelyn [Presence-PC]

I understand it has to re-iterate if the number of rows in the result
without DISTINCT is greater than the limit clause, but if the result without
DISTINCT is lower, it should be faster to perform the DISTINCT on the result
directly (or perhaps I'm missing something ? ;)).
- Original Message -
From: Sinisa Milivojevic [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, January 18, 2002 4:27 PM
Subject: Re: Why does DISTINCT take so long time ??


 Fournier Jocelyn [Presence-PC] writes:
  Hi,
 
  Yes it is 4.0.1.
  But the first query has also to make a temporary table with 2 million
rows,
  it's why I don't understand the delta between the query without DISTINCT
and
  the query with DISTINCT.
  The remove duplicates doesn't occurs after the join was performed ??
(it
  should be really fast in this case)
 
  Regards,
 
  Jocelyn

 DISTINCT simply has to re-iterate.

 --
 Regards,
__  ___ ___   __
   /  |/  /_ __/ __/ __ \/ /Mr. Sinisa Milivojevic [EMAIL PROTECTED]
  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Fulltime Developer
 /_/  /_/\_, /___/\___\_\___/   Larnaca, Cyprus
___/   www.mysql.com


 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail
[EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why does DISTINCT take so long time ??

2002-01-18 Thread Anvar Hussain K.M.

Hi,

Yes, the reason for the time difference is that for distinct query, as 
Sinisa noted, it has to reiterate.

For the output to generate, first the rows have to be ordered ( in this 
case since count(*) is given every column
should be present in the comparison.) using a temp table (or any other 
mechanism to keep rows ordered).

For the first query also there should be an intermediate temp table to 
order rows but here it is only necessary to
consider the columns in the group by clause.

Considering these, a rough and primitive estimate of the time taken to 
execute the query can be found.

Suppose the average length of a row is 300 bytes and the three columns in 
the group by clause takes
30 bytes average.  Then if the first query takes 15 minutes, the second 
query will take 150 minutes.


This may not be the real scenario with mysql but some thing similar.
The having clause I feel, is redundant.  the 15 min for the first qurey 
seem too much, perhaps indexing might
help.

Hope somebody else  has a better explanation.
Anvar.

At 02:41 AM 18/01/2002 +0100, you wrote:
Hi,

I've notice sometimes DISTINCT clause take a really high amount of time to
remove duplicates whereas it should be really quick (I assume it should be
;))

My first query is :

mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY
mot,date,numreponse HAVING count1 LIMIT 100;

it returns :

+---++
| count | numreponse |
+---++
| 2 | 111239 |
| 2 | 108183 |
| 2 | 73 |
| 2 | 111383 |
cut
| 2 | 111239 |
| 2 | 111760 |
| 3 | 109166 |
| 2 | 09 |
| 3 | 109166 |
+---++
58 rows in set (14 min 51.15 sec)

My second query is :


mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;

Well I'm not enough patient to wait, but when I stop the querie, it has been
running for more than 3500 seconds... (and more than 45mn in 'Removing
duplicates' state...)

mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM
searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
+---+---+---+-+-+--+
-+--+
| table | type  | possible_keys | key | key_len | ref  |
rows| Extra|
+---+---+---+-+-+--+
-+--+
| searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
2026032 | Using index; Using temporary |
+---+---+---+-+-+--+
-+--+
1 row in set (0.00 sec)

mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
+---+---+---+-+-+--+
-+--+
| table | type  | possible_keys | key | key_len | ref  |
rows| Extra|
+---+---+---+-+-+--+
-+--+
| searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
2026032 | Using index; Using temporary |
+---+---+---+-+-+--+
-+--+
1 row in set (0.00 sec)


Why does it take so much time to remove duplicates in only 58 rows ??

Thank you :)

Regards,

Jocelyn Fournier
Presence-PC






-
Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail 
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Why does DISTINCT take so long time ??

2002-01-17 Thread Fournier Jocelyn [Presence-PC]

Hi,

I've notice sometimes DISTINCT clause take a really high amount of time to
remove duplicates whereas it should be really quick (I assume it should be
;))

My first query is :

mysql SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3 GROUP BY
mot,date,numreponse HAVING count1 LIMIT 100;

it returns :

+---++
| count | numreponse |
+---++
| 2 | 111239 |
| 2 | 108183 |
| 2 | 73 |
| 2 | 111383 |
cut
| 2 | 111239 |
| 2 | 111760 |
| 3 | 109166 |
| 2 | 09 |
| 3 | 109166 |
+---++
58 rows in set (14 min 51.15 sec)

My second query is :


mysql SELECT DISTINCT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;

Well I'm not enough patient to wait, but when I stop the querie, it has been
running for more than 3500 seconds... (and more than 45mn in 'Removing
duplicates' state...)

mysql EXPLAIN SELECT DISTINCT COUNT(*) as count, numreponse FROM
searchhardwarefr3 GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
+---+---+---+-+-+--+
-+--+
| table | type  | possible_keys | key | key_len | ref  |
rows| Extra|
+---+---+---+-+-+--+
-+--+
| searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
2026032 | Using index; Using temporary |
+---+---+---+-+-+--+
-+--+
1 row in set (0.00 sec)

mysql EXPLAIN SELECT COUNT(*) as count, numreponse FROM searchhardwarefr3
GROUP BY mot,date,numreponse HAVING count1 LIMIT 100;
+---+---+---+-+-+--+
-+--+
| table | type  | possible_keys | key | key_len | ref  |
rows| Extra|
+---+---+---+-+-+--+
-+--+
| searchhardwarefr3 | index | NULL  | PRIMARY |  75 | NULL |
2026032 | Using index; Using temporary |
+---+---+---+-+-+--+
-+--+
1 row in set (0.00 sec)


Why does it take so much time to remove duplicates in only 58 rows ??

Thank you :)

Regards,

Jocelyn Fournier
Presence-PC






-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php