It is a useful functionality, especially since the SELECT list can contain mix of primitive types and array or map type columns. I do see your point about the Postgres semantics of ORDER BY vs DISTINCT ON. I don't like the hidden semantics of MySQL either. Perhaps the compromise solution is to introduce functions such as FIRST_ROW() aggregation function (similar to lead/lag window functions) and use GROUP BY.
-Aman On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <jh...@apache.org> wrote: > I can see that it would be useful. But there are a couple of things about > it that are messy. > > I think the syntax is a bit clunky because it uses parentheses; a query > would be difficult to read if people would like multiple columns, > expressions, and aliases. > > I also think the semantics are messy. If you read > https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT < > https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT> > you will see that “DISTINCT ON” is evaluated after the ORDER BY clause, > whereas regular DISTINCT is evaluated before the ORDER BY clause. So there > will be a bizarre interaction if DISTINCT ON is used with UNION and ORDER > BY. > > By the way, good ol’ MySQL doesn’t have this problem; you can just write > > SELECT a, b, c FROM t GROUP BY a > > and it just picks the first value of b and c. This “feature” is one of my > least favorite things about MySQL, so let’s not emulate it. :) > > Julian > > > > On Feb 21, 2018, at 12:44 PM, Aman Sinha <amansi...@apache.org> wrote: > > > > The DISTINCT 'ON' clause is not supported but I am trying to see if there > > are other people who have run into this. One of the use cases I have > > intuitively maps to something like this: > > > > SELECT DISTINCT *ON (a)*, b, c FROM T > > > > Here suppose 'a' is an INT and b, c are some complex types such as > > array. In my example, the values of b and c happen to be the same in > all > > rows belonging to the same group of 'a', so I just want the first row. > > Since these are arrays, I cannot use MIN(b) GROUP BY a . Other > > alternative is to create a new aggregate function that picks the first > > value but it makes the syntax verbose especially with large number of > such > > columns. > > > > Incidentally, Postgres supports this [1] > > > > [1] > > https://www.postgresql.org/docs/current/static/sql- > select.html#SQL-DISTINCT > > > > > > -Aman > >