RE: Problems with "select distinct"

Gustafson, Tim Wed, 26 Jan 2005 06:25:39 -0800

Michael,

Thanks for your suggestion.  Here's the query that seems to have fixed
the problem, without a DISTINCT clause:

select Documents.ID, 
       Name, 
       max(DownloadLog.AddedOn) as DownloadedOn 
from Documents, 
     DocumentFiles, 
     DownloadLog 
where Documents.ID = DocumentFiles.Document and 
      DocumentFiles.ID = DownloadLog.DocumentFile 
group by Documents.ID 
order by DownloadedOn desc 
limit 10

It seems to work perfectly.  Thanks again!

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 

-----Original Message-----
From: Michael Stassen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 12:08 AM
To: Frederic Wenzel
Cc: Gustafson, Tim; [email protected]
Subject: Re: Problems with "select distinct"

Frederic Wenzel wrote:

> On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen
> <[EMAIL PROTECTED]> wrote:
> 
>>I suspect mysql is doing the DISTINCT before the ORDER BY.  One of the
>>reasons I avoid DISTINCT when possible.
> 
> Isn't this supposed to be correct? Ordering has to take place as the
> very last operation, after any selection and projection, doesn't it?
> 
> Regards
> Fred

In theory, or in practice?  In practice, ordering is sometimes done
ahead of 
time using an index, and DISTINCT may be optimized as a GROUP BY.  In 
theory, ORDER BY sorts the rows and DISTINCT throws out duplicates.  So
long 
as we sort by columns included in the SELECT output, it won't make any 
difference which you do first and which last.  The problem comes when we

sort on columns not in the selected output.  Now that I've thought about
it 
some more, I don't believe order of operations matters at all in that
case, 
because  which duplicate rows are thrown away by DISTINCT is not
defined. 
You see?  Suppose, prior to DISTINCT or ORDER BY, you have these rows:

   ID Name
  +--+------+
   10 Test 1
   10 Test 1
   11 Test 2
   10 Test 1
   11 Test 2

Which two rows should DISTINCT keep?  Even if you sort first, DISTINCT
is 
under no obligation to choose the first rows it finds.

That DISTINCT may be optimized as a GROUP BY is instructive.  MySQL lets
you 
do things like

   SELECT id, name, updated FROM mytable GROUP BY id, name;

but the 'updated' column is chosen more at less randomly for each group.

That is, for each group, you simply get one value of updated from an 
undetermined row which belongs to the group.  I think that's what's 
happening here.  The lesson is that you cannot use DISTINCT and then
ORDER 
BY an unselected column and get meaningful results.

I'm not a big fan of DISTINCT.  We get a lot of questions on the list
that 
amount to "Here's my query, but it gives me more rows than I want.  I
tried 
to fix it by adding DISTINCT, but now I don't get the right result."
More 
often than not, there is a better query which explicitly retrieves
precisely 
the desired rows, with no need for DISTINCT.

Michael

smime.p7s
Description: S/MIME cryptographic signature

RE: Problems with "select distinct"

Reply via email to