RE: Problems with select distinct

2005-01-26 Thread Gustafson, Tim
Michael,

Thanks for your suggestion.  Here's the query that seems to have fixed
the problem, without a DISTINCT clause:

select Documents.ID, 
   Name, 
   max(DownloadLog.AddedOn) as DownloadedOn 
from Documents, 
 DocumentFiles, 
 DownloadLog 
where Documents.ID = DocumentFiles.Document and 
  DocumentFiles.ID = DownloadLog.DocumentFile 
group by Documents.ID 
order by DownloadedOn desc 
limit 10

It seems to work perfectly.  Thanks again!

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 



-Original Message-
From: Michael Stassen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 12:08 AM
To: Frederic Wenzel
Cc: Gustafson, Tim; mysql@lists.mysql.com
Subject: Re: Problems with select distinct



Frederic Wenzel wrote:

 On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen
 [EMAIL PROTECTED] wrote:
 
I suspect mysql is doing the DISTINCT before the ORDER BY.  One of the
reasons I avoid DISTINCT when possible.
 
 Isn't this supposed to be correct? Ordering has to take place as the
 very last operation, after any selection and projection, doesn't it?
 
 Regards
 Fred

In theory, or in practice?  In practice, ordering is sometimes done
ahead of 
time using an index, and DISTINCT may be optimized as a GROUP BY.  In 
theory, ORDER BY sorts the rows and DISTINCT throws out duplicates.  So
long 
as we sort by columns included in the SELECT output, it won't make any 
difference which you do first and which last.  The problem comes when we

sort on columns not in the selected output.  Now that I've thought about
it 
some more, I don't believe order of operations matters at all in that
case, 
because  which duplicate rows are thrown away by DISTINCT is not
defined. 
You see?  Suppose, prior to DISTINCT or ORDER BY, you have these rows:

   ID Name
  +--+--+
   10 Test 1
   10 Test 1
   11 Test 2
   10 Test 1
   11 Test 2

Which two rows should DISTINCT keep?  Even if you sort first, DISTINCT
is 
under no obligation to choose the first rows it finds.

That DISTINCT may be optimized as a GROUP BY is instructive.  MySQL lets
you 
do things like

   SELECT id, name, updated FROM mytable GROUP BY id, name;

but the 'updated' column is chosen more at less randomly for each group.

That is, for each group, you simply get one value of updated from an 
undetermined row which belongs to the group.  I think that's what's 
happening here.  The lesson is that you cannot use DISTINCT and then
ORDER 
BY an unselected column and get meaningful results.

I'm not a big fan of DISTINCT.  We get a lot of questions on the list
that 
amount to Here's my query, but it gives me more rows than I want.  I
tried 
to fix it by adding DISTINCT, but now I don't get the right result.
More 
often than not, there is a better query which explicitly retrieves
precisely 
the desired rows, with no need for DISTINCT.

Michael


smime.p7s
Description: S/MIME cryptographic signature


Re: Problems with select distinct

2005-01-25 Thread Michael Stassen
I suspect mysql is doing the DISTINCT before the ORDER BY.  One of the 
reasons I avoid DISTINCT when possible.  Try:

  SELECT Documents.ID, Name
  FROM Documents, DocumentFiles, DownloadLog
  WHERE Documents.ID = DocumentFiles.Document
AND DocumentFiles.ID = DownloadLog.DocumentFile
  GROUP BY Documents.ID, Name
  ORDER BY MAX(DownloadLog.AddedOn) DESC LIMIT 5;
Michael
Gustafson, Tim wrote:
Hello
I am trying to run the following query on my server:
select Documents.ID,
   Name
from Documents,
 DocumentFiles,
 DownloadLog
where Documents.ID = DocumentFiles.Document and
  DocumentFiles.ID = DownloadLog.DocumentFile
order by DownloadLog.AddedOn desc  
limit 5

It works correctly, except that it displays duplicate rows.  If I put a
distinct before Documents.ID, it gives me only unique rows, but the
rows are not ordered correctly.  For example, without the distinct
clause, I get the following rows:
ID Name
10 Test 1
10 Test 1
11 Test 2
10 Test 1
12 Test 3
With the distinct clause, I get this:
ID Name
12 Test 3
13 Test 4
10 Test 1
11 Test 2
14 Test 5
Why does adding the distinct clause change the order of the rows
completely?
Thanks in advance!
Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Problems with select distinct

2005-01-25 Thread Frederic Wenzel
On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen
[EMAIL PROTECTED] wrote:
 I suspect mysql is doing the DISTINCT before the ORDER BY.  One of the
 reasons I avoid DISTINCT when possible.

Isn't this supposed to be correct? Ordering has to take place as the
very last operation, after any selection and projection, doesn't it?


Regards
Fred

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Problems with select distinct

2005-01-25 Thread Michael Stassen
Frederic Wenzel wrote:
On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen
[EMAIL PROTECTED] wrote:
I suspect mysql is doing the DISTINCT before the ORDER BY.  One of the
reasons I avoid DISTINCT when possible.
Isn't this supposed to be correct? Ordering has to take place as the
very last operation, after any selection and projection, doesn't it?
Regards
Fred
In theory, or in practice?  In practice, ordering is sometimes done ahead of 
time using an index, and DISTINCT may be optimized as a GROUP BY.  In 
theory, ORDER BY sorts the rows and DISTINCT throws out duplicates.  So long 
as we sort by columns included in the SELECT output, it won't make any 
difference which you do first and which last.  The problem comes when we 
sort on columns not in the selected output.  Now that I've thought about it 
some more, I don't believe order of operations matters at all in that case, 
because  which duplicate rows are thrown away by DISTINCT is not defined. 
You see?  Suppose, prior to DISTINCT or ORDER BY, you have these rows:

  ID Name
 +--+--+
  10 Test 1
  10 Test 1
  11 Test 2
  10 Test 1
  11 Test 2
Which two rows should DISTINCT keep?  Even if you sort first, DISTINCT is 
under no obligation to choose the first rows it finds.

That DISTINCT may be optimized as a GROUP BY is instructive.  MySQL lets you 
do things like

  SELECT id, name, updated FROM mytable GROUP BY id, name;
but the 'updated' column is chosen more at less randomly for each group. 
That is, for each group, you simply get one value of updated from an 
undetermined row which belongs to the group.  I think that's what's 
happening here.  The lesson is that you cannot use DISTINCT and then ORDER 
BY an unselected column and get meaningful results.

I'm not a big fan of DISTINCT.  We get a lot of questions on the list that 
amount to Here's my query, but it gives me more rows than I want.  I tried 
to fix it by adding DISTINCT, but now I don't get the right result.  More 
often than not, there is a better query which explicitly retrieves precisely 
the desired rows, with no need for DISTINCT.

Michael
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]