Re: [HACKERS] ORDER BY and DISTINCT ON

Tom Lane Mon, 15 Dec 2003 07:57:25 -0800

Bruno Wolff III <[EMAIL PROTECTED]> writes:
> Specifically the interpretation I think makes sense is that
> SELECT DISTINCT ON (a, b, c) * FROM tablename ORDER BY d, e, f
> should be treated as equvialent to
> SELECT * FROM
>   (SELECT DISTINCT ON (a, b, c) FROM tablename ORDER BY a, b, c, d, e, f) AS t
>   ORDER BY d, e, f


Right, that's the same thing Neil said elsewhere in the thread.  I guess
I agree that this would be cleaner, and also that it doesn't seem like a
high priority problem.

BTW, you could imagine implementing DISTINCT and DISTINCT ON by a hash
table, if you didn't expect too many distinct rows.  The hash key would
be the DISTINCT columns, and in the DISTINCT ON case, when you get a new
row with DISTINCT columns matching an existing entry, you compare the
two rows using the ORDER BY key columns to decide which row to keep.  In
this formulation it's fairly obvious that the DISTINCT and ORDER BY keys
don't need to overlap at all.  This technique does not require presorted
input (good) but also delivers unsorted output (bad).  It could still be
a big win if you expect the DISTINCT filtering to reduce the number of
rows substantially, because you'd end up sorting a much smaller number
of rows.  I didn't get around to trying to implement this when I was
doing hash aggregation for 7.4, but it seems like a reasonable item for
the TODO list.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] ORDER BY and DISTINCT ON

Reply via email to