[
https://issues.apache.org/jira/browse/CALCITE-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080071#comment-18080071
]
Julian Hyde commented on CALCITE-7517:
--------------------------------------
The ANY_VALUE function was added in CALCITE-2366 to support queries that
otherwise would use DISTINCT ON. Please comment on that.
Can DISTINCT ON be handled as syntactic sugar, or does it need changes to the
RelNode algebra? Please walk through a couple of examples.
> Support DISTINCT ON clause in SELECT statements
> -----------------------------------------------
>
> Key: CALCITE-7517
> URL: https://issues.apache.org/jira/browse/CALCITE-7517
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.41.0
> Reporter: Yu Xu
> Assignee: Yu Xu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> Currently Calcite not support DISTINCT ON syntax, such as:
>
> {code:java}
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;{code}
>
> many mainstream databases support this syntax, such as Postgresql ClickHouse:
> PostgreSql: [https://www.postgresql.org/docs/18/sql-select.html]
> ClickHouse:
> [https://clickhouse.com/docs/sql-reference/statements/select/distinct]
> Benefits of introducing {{DISTINCT ON}} syntax into Calcite:
> *1. PostgreSQL Compatibility*
> {{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows
> Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.)
> to parse and execute SQL written for PostgreSQL without manual rewrite.
> *2.Cleaner Semantics for First-Row-per-Group Queries*
> It solves the common "get the first row per partition" pattern directly and
> declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose
> {{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions
> wrapped in a CTE or sub-query, which is harder to read and maintain.
> *3.Reduced Boilerplate*
> Compare the two approaches:
> {code:java}
> -- With DISTINCT ON (concise, intent is obvious)
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;
> -- Without it (verbose, error-prone)
> WITH ranked AS (
> SELECT empno, ename,
> ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
> FROM emp
> )
> SELECT empno, ename FROM ranked WHERE rn = 1; {code}
> *4.Better Integration with ORDER BY and LIMIT*
> Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it
> only deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}}
> behave naturally after the deduplication step. The optimizer can also reason
> about the sort order more easily than with an opaque window-filter pattern.
> *5. Easier for Query Builders and BI Tools*
> Many tools generate SQL programmatically. Having a single clause for "keep
> the best match per group" simplifies code generation and avoids the
> complexity of synthesizing window functions and outer filters.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)