[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074311#comment-18074311
 ] 

Julian Hyde edited comment on CALCITE-7463 at 4/17/26 5:02 PM:
---------------------------------------------------------------

I agree that there are both *nondeterministic* operations and truly *random* 
operations, but for these purposes we really care whether a data set is 
*repeatable*.

In some databases, CTEs are repeatable. If I write
{code}
with T as <some query>
select * from T
except
select * from T
{code}

in BigQuery I am guaranteed to get zero rows.

Calcite's default behavior for CTEs is to inline them (as if they were views), 
so there is some chance that the two executions will produce different rows, 
and therefore the above {{EXCEPT}} query may not return empty. (In 
CALCITE-6716, I proposed a {{REPEATABLE}} keyword to explicitly declare whether 
a CTE is repeatable.)

I think we need to parameterize the environment executing planner rules with a 
boolean mode. The mode determines whether two RelNodes that are deeply-equal 
are considered to be repeatable. It is valid to collapse
{code}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2) 
{code}
into
{code}SELECT DISTINCT mgr, comm FROM emp LIMIT 2{code}
if and only if the mode is true.


was (Author: julianhyde):
I agree that there are both *nondeterministic* operations and truly *random* 
operations, but for these purposes we really care whether a data set is 
*repeatable*.

In some databases (such as BigQuery), CTEs are repeatable. If I write
{code}
with T as <some query>
select * from T
except
select * from T
{code}

Calcite's default behavior for CTEs is to inline them (as if they were views), 
so there is some chance that the two executions will produce different rows. 
(In CALCITE-6716, I proposed a {{REPEATABLE}} keyword to explicitly declare 
whether a CTE is repeatable.)

I think we need to parameterize the environment executing planner rules with a 
boolean mode. The mode determines whether two RelNodes that are deeply-equal 
are considered to be repeatable. It is valid to collapse
{code}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2) 
{code}
into
{code}SELECT DISTINCT mgr, comm FROM emp LIMIT 2{code}
if and only if the mode is true.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
>                 Key: CALCITE-7463
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7463
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to