[jira] [Commented] (CALCITE-2366) Add support for ANY_VALUE function

Yu Xu (Jira) Mon, 11 May 2026 19:21:48 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080186#comment-18080186
 ]


Yu Xu commented on CALCITE-2366:
--------------------------------

There are still some differences between DISTINCT ON and ANY_VALUE in my view：

1.ANY_VALUE returns any row; which row it returns is uncertain and depends on 
the underlying executor implementation.

 
{code:java}
SELECT deptno, ANY_VALUE(empno), ANY_VALUE(ename)
FROM emp
GROUP BY deptno; {code}
You might get empno=7369 or empno=7499; the result is unpredictable. If you 
need the highest-paid or earliest-hired employee, ANY_VALUE simply won't meet 
your needs.

The `DISTINCT ON` directive, when used in conjunction with `ORDER BY`, allows 
for precise control over which rows are retained.

 
{code:java}
SELECT DISTINCT ON (deptno) empno, ename, sal
FROM emp
ORDER BY deptno, sal DESC; -- Explicitly retain the highest-paid employee in 
each department. {code}
 

 

 

2. When replacing DISTINCT ON with ANY_VALUE, you must: Wrap ANY_VALUE() around 
each of the non-grouping columns individually Loss sorting control The code 
becomes verbose and ambiguous.

Using `DISTINCT ON`, the declarative syntax immediately reveals that it "groups 
by `deptno`, retaining the first row of each sorted group," naturally 
integrating into the existing `ORDER BY` semantic system.

3. `DISTINCT ON` performs deduplication on an ordered data stream, therefore:

 
{code:java}
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC
LIMIT 10; {code}
The semantics are: first group by department, then sort by hire date in 
descending order within each group, take the first person in each group, and 
finally take the top 10 results. Using ANY_VALUE with subqueries makes it 
difficult to elegantly express this complex semantics of "sorting, 
partitioning, deduplicating, and truncating".

 

4. Better PostgreSQL ecosystem compatibility: DISTINCT ON is a signature syntax 
of PostgreSQL, used by a large number of existing queries, ORM frameworks, and 
BI tools. Calcite, as an SQL parsing and optimization infrastructure, supports 
this syntax and is directly compatible with these ecosystems, without requiring 
users to manually rewrite it as a {*}window function + CTE{*}.

5. After `DISTINCT ON` is converted to `ROW_NUMBER() OVER (PARTITION BY ... 
ORDER BY ...) + filter` in the RelOpt layer, Calcite's optimizer can: Utilize 
existing window function optimization rules Merge operators with Sort, Filter, 
and Project Select more efficient execution strategies (such as Top-N per 
partition) during the physical planning phase However, `ANY_VALUE`, as an 
aggregation function, has a completely different optimization path and cannot 
utilize sorting information.

> Add support for ANY_VALUE function
> ----------------------------------
>
>                 Key: CALCITE-2366
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2366
>             Project: Calcite
>          Issue Type: New Feature
>    Affects Versions: 1.16.0
>            Reporter: Gautam Parai
>            Assignee: Gautam Parai
>            Priority: Major
>             Fix For: 1.17.0
>
>
> We had discussions on the Apache Calcite [1] and Apache Drill [2] mailing 
> lists regarding an equivalent for DISTINCT ON. The community seems to prefer 
> the ANY_VALUE. This Jira is a placeholder for implementing the ANY_VALUE 
> aggregate function in Apache Calcite.  
> [1][https://lists.apache.org/thread.html/f2007a489d3a5741875bcc8a1edd8d5c3715e5114ac45058c3b3a42d@%3Cdev.calcite.apache.org%3E]
> [2][https://lists.apache.org/thread.html/2517eef7410aed4e88b9515f7e4256335215c1ad39a2676a08d21cb9@%3Cdev.drill.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-2366) Add support for ANY_VALUE function

Reply via email to