[
https://issues.apache.org/jira/browse/CALCITE-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-7517:
------------------------------------
Labels: pull-request-available (was: )
> Support DISTINCT ON clause in SELECT statements
> -----------------------------------------------
>
> Key: CALCITE-7517
> URL: https://issues.apache.org/jira/browse/CALCITE-7517
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.41.0
> Reporter: Yu Xu
> Assignee: Yu Xu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> Currently Caclite not support DISTINCT ON syntax, such as:
>
> {code:java}
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;{code}
>
> many mainstream databases support this syntax, such as Postgresql ClickHouse:
> PostgreSql: https://www.postgresql.org/docs/18/sql-select.html
> ClickHouse:
> [https://clickhouse.com/docs/sql-reference/statements/select/distinct]
> Benefits of introducing {{DISTINCT ON}} syntax into Calcite:
> *1. PostgreSQL Compatibility*
> {{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows
> Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.)
> to parse and execute SQL written for PostgreSQL without manual rewrite.
> *2.Cleaner Semantics for First-Row-per-Group Queries*
> It solves the common "get the first row per partition" pattern directly and
> declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose
> {{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions
> wrapped in a CTE or sub-query, which is harder to read and maintain.
> *3.Reduced Boilerplate*
> Compare the two approaches:
> {code:java}
> -- With DISTINCT ON (concise, intent is obvious)
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;
> -- Without it (verbose, error-prone)
> WITH ranked AS (
> SELECT empno, ename,
> ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
> FROM emp
> )
> SELECT empno, ename FROM ranked WHERE rn = 1; {code}
> *4.Better Integration with ORDER BY and LIMIT*
> Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it
> only deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}}
> behave naturally after the deduplication step. The optimizer can also reason
> about the sort order more easily than with an opaque window-filter pattern.
> *5. Easier for Query Builders and BI Tools*
> Many tools generate SQL programmatically. Having a single clause for "keep
> the best match per group" simplifies code generation and avoids the
> complexity of synthesizing window functions and outer filters.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)