Michael, Thanks for your suggestion. Here's the query that seems to have fixed the problem, without a DISTINCT clause:
select Documents.ID, Name, max(DownloadLog.AddedOn) as DownloadedOn from Documents, DocumentFiles, DownloadLog where Documents.ID = DocumentFiles.Document and DocumentFiles.ID = DownloadLog.DocumentFile group by Documents.ID order by DownloadedOn desc limit 10 It seems to work perfectly. Thanks again! Tim Gustafson MEI Technology Consulting, Inc [EMAIL PROTECTED] (516) 379-0001 Office (516) 480-1870 Mobile/Emergencies (516) 908-4185 Fax http://www.meitech.com/ -----Original Message----- From: Michael Stassen [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 26, 2005 12:08 AM To: Frederic Wenzel Cc: Gustafson, Tim; mysql@lists.mysql.com Subject: Re: Problems with "select distinct" Frederic Wenzel wrote: > On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen > <[EMAIL PROTECTED]> wrote: > >>I suspect mysql is doing the DISTINCT before the ORDER BY. One of the >>reasons I avoid DISTINCT when possible. > > Isn't this supposed to be correct? Ordering has to take place as the > very last operation, after any selection and projection, doesn't it? > > Regards > Fred In theory, or in practice? In practice, ordering is sometimes done ahead of time using an index, and DISTINCT may be optimized as a GROUP BY. In theory, ORDER BY sorts the rows and DISTINCT throws out duplicates. So long as we sort by columns included in the SELECT output, it won't make any difference which you do first and which last. The problem comes when we sort on columns not in the selected output. Now that I've thought about it some more, I don't believe order of operations matters at all in that case, because which duplicate rows are thrown away by DISTINCT is not defined. You see? Suppose, prior to DISTINCT or ORDER BY, you have these rows: ID Name +--+------+ 10 Test 1 10 Test 1 11 Test 2 10 Test 1 11 Test 2 Which two rows should DISTINCT keep? Even if you sort first, DISTINCT is under no obligation to choose the first rows it finds. That DISTINCT may be optimized as a GROUP BY is instructive. MySQL lets you do things like SELECT id, name, updated FROM mytable GROUP BY id, name; but the 'updated' column is chosen more at less randomly for each group. That is, for each group, you simply get one value of updated from an undetermined row which belongs to the group. I think that's what's happening here. The lesson is that you cannot use DISTINCT and then ORDER BY an unselected column and get meaningful results. I'm not a big fan of DISTINCT. We get a lot of questions on the list that amount to "Here's my query, but it gives me more rows than I want. I tried to fix it by adding DISTINCT, but now I don't get the right result." More often than not, there is a better query which explicitly retrieves precisely the desired rows, with no need for DISTINCT. Michael
smime.p7s
Description: S/MIME cryptographic signature