[ 
https://issues.apache.org/jira/browse/CALCITE-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Xu updated CALCITE-7517:
---------------------------
    Description: 
Currently Calcite not support DISTINCT ON syntax, such as:

 
{code:java}
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;{code}
 

many mainstream databases support this syntax, such as Postgresql ClickHouse:

PostgreSql: [https://www.postgresql.org/docs/18/sql-select.html]

ClickHouse: 
[https://clickhouse.com/docs/sql-reference/statements/select/distinct]

Benefits of introducing {{DISTINCT ON}} syntax into Calcite:

*1. PostgreSQL Compatibility*

{{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows 
Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.) 
to parse and execute SQL written for PostgreSQL without manual rewrite.

*2.Cleaner Semantics for First-Row-per-Group Queries*

It solves the common "get the first row per partition" pattern directly and 
declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose 
{{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions 
wrapped in a CTE or sub-query, which is harder to read and maintain.

*3.Reduced Boilerplate*

Compare the two approaches:
{code:java}
-- With DISTINCT ON (concise, intent is obvious)
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;

-- Without it (verbose, error-prone)
WITH ranked AS (
  SELECT empno, ename,
    ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
  FROM emp
)
SELECT empno, ename FROM ranked WHERE rn = 1; {code}
*4.Better Integration with ORDER BY and LIMIT*

Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it only 
deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} behave 
naturally after the deduplication step. The optimizer can also reason about the 
sort order more easily than with an opaque window-filter pattern.

*5. Easier for Query Builders and BI Tools*

Many tools generate SQL programmatically. Having a single clause for "keep the 
best match per group" simplifies code generation and avoids the complexity of 
synthesizing window functions and outer filters.

 

 

 

  was:
Currently Caclite not support DISTINCT ON syntax, such as:

 
{code:java}
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;{code}
 

many mainstream databases support this syntax, such as Postgresql ClickHouse:

PostgreSql: https://www.postgresql.org/docs/18/sql-select.html

ClickHouse: 
[https://clickhouse.com/docs/sql-reference/statements/select/distinct]

Benefits of introducing {{DISTINCT ON}} syntax into Calcite:

*1. PostgreSQL Compatibility*

{{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows 
Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.) 
to parse and execute SQL written for PostgreSQL without manual rewrite.

*2.Cleaner Semantics for First-Row-per-Group Queries*

It solves the common "get the first row per partition" pattern directly and 
declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose 
{{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions 
wrapped in a CTE or sub-query, which is harder to read and maintain.

*3.Reduced Boilerplate*

Compare the two approaches:
{code:java}
-- With DISTINCT ON (concise, intent is obvious)
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;

-- Without it (verbose, error-prone)
WITH ranked AS (
  SELECT empno, ename,
    ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
  FROM emp
)
SELECT empno, ename FROM ranked WHERE rn = 1; {code}
*4.Better Integration with ORDER BY and LIMIT*

Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it only 
deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} behave 
naturally after the deduplication step. The optimizer can also reason about the 
sort order more easily than with an opaque window-filter pattern.

*5. Easier for Query Builders and BI Tools*

Many tools generate SQL programmatically. Having a single clause for "keep the 
best match per group" simplifies code generation and avoids the complexity of 
synthesizing window functions and outer filters.

 

 

 


> Support DISTINCT ON clause in SELECT statements
> -----------------------------------------------
>
>                 Key: CALCITE-7517
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7517
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Yu Xu
>            Assignee: Yu Xu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> Currently Calcite not support DISTINCT ON syntax, such as:
>  
> {code:java}
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;{code}
>  
> many mainstream databases support this syntax, such as Postgresql ClickHouse:
> PostgreSql: [https://www.postgresql.org/docs/18/sql-select.html]
> ClickHouse: 
> [https://clickhouse.com/docs/sql-reference/statements/select/distinct]
> Benefits of introducing {{DISTINCT ON}} syntax into Calcite:
> *1. PostgreSQL Compatibility*
> {{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows 
> Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.) 
> to parse and execute SQL written for PostgreSQL without manual rewrite.
> *2.Cleaner Semantics for First-Row-per-Group Queries*
> It solves the common "get the first row per partition" pattern directly and 
> declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose 
> {{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions 
> wrapped in a CTE or sub-query, which is harder to read and maintain.
> *3.Reduced Boilerplate*
> Compare the two approaches:
> {code:java}
> -- With DISTINCT ON (concise, intent is obvious)
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;
> -- Without it (verbose, error-prone)
> WITH ranked AS (
>   SELECT empno, ename,
>     ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
>   FROM emp
> )
> SELECT empno, ename FROM ranked WHERE rn = 1; {code}
> *4.Better Integration with ORDER BY and LIMIT*
> Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it 
> only deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} 
> behave naturally after the deduplication step. The optimizer can also reason 
> about the sort order more easily than with an opaque window-filter pattern.
> *5. Easier for Query Builders and BI Tools*
> Many tools generate SQL programmatically. Having a single clause for "keep 
> the best match per group" simplifies code generation and avoids the 
> complexity of synthesizing window functions and outer filters.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to