[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074311#comment-18074311
]
Julian Hyde commented on CALCITE-7463:
--------------------------------------
I agree that there are both *nondeterministic* operations and truly *random*
operations, but for these purposes we really care whether a data set is
*repeatable*.
In some databases (such as BigQuery), CTEs are repeatable. If I write
{code}
with T as <some query>
select * from T
except
select * from T
{code}
Calcite's default behavior for CTEs is to inline them (as if they were views),
so there is some chance that the two executions will produce different rows.
(In CALCITE-6716, I proposed a {{REPEATABLE}} keyword to explicitly declare
whether a CTE is repeatable.)
I think we need to parameterize the environment executing planner rules with a
boolean mode. The mode determines whether two RelNodes that are deeply-equal
are considered to be repeatable. It is valid to collapse
{code}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2)
{code}
into
{code}SELECT DISTINCT mgr, comm FROM emp LIMIT 2{code}
if and only if the mode is true.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)