[
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) updated CALCITE-7551:
----------------------------------------------------------
Description:
Several optimizer and {{RelBuilder}}-level transformations inline a
lower-projection expression into multiple positions of an upper expression
without checking whether the inlined expression is non-deterministic. When the
lower
expression is non-deterministic in the strict sense (i.e. two calls within the
same statement may return different values — {{RAND()}}, sequence {{NEXTVAL}},
UUID generators), this silently converts one evaluation into many,
changing query semantics.
{{CURRENT_TIMESTAMP}} and the other time functions are not in scope: Calcite
considers them deterministic per statement (see the "Note on dynamic vs
non-deterministic" section below) and they are safe to duplicate.
h3. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
h3. Current plan produced by {{SqlToRelConverter}}:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two {{RAND()}} calls are evaluated independently, so {{B - A}} is no longer
always {{1}}. The expected plan preserves the single evaluation by referencing
the projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
h3. Affected code paths (confirmed by reproducer tests)
||Component||Mechanism||
|{{RelOptUtil.pushPastProjectUnlessBloat}}|The shuttle returned by
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying
projection. Only {{RexOver}} is guarded; determinism is not.|
|{{RelBuilder.project_}} (bloat merge)|Calls {{pushPastProjectUnlessBloat}}
when merging adjacent {{Project}}s. This is why the bug surfaces in the very
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
|{{ProjectMergeRule}}|Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
{{RAND()}} calls.|
|{{FilterProjectTransposeRule}}|Calls {{pushPastProjectUnlessBloat}} on the
filter condition, and re-emits the original projection above the new filter.
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
|{{JoinProjectTransposeRule}}|Builds {{RexProgram}}s for projection + join
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
back into a {{RexNode}} tree, inlining the underlying expression at every
occurrence. Test: 1 → 3.|
|{{SemiJoinProjectTransposeRule}}|Same mechanism as
{{JoinProjectTransposeRule}} — uses {{RexProgramBuilder.mergePrograms}} +
{{expandLocalRef}} to inline projected expressions into the semi-join
condition. Test: 1 → 2.|
h3. Confirmed unaffected (by the same tests)
- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
referenced multiple times. Sharing is preserved.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
projected expression is non-reducible.
- {{ProjectFilterTransposeRule}}, {{ProjectJoinTransposeRule}},
{{ProjectCorrelateTransposeRule}}, {{AggregateProjectMergeRule}} — confirmed
safe on tested patterns.
- {{FilterSetOpTransposeRule}} — textual count of {{RAND(}} doubles when the
filter is cloned to each {{UNION ALL}} branch, but this is NOT a semantic bug:
any single row only flows through one branch's filter, so per-row
evaluation count is unchanged.
h3. Note on dynamic vs non-deterministic ({{CURRENT_TIMESTAMP}})
Calcite already distinguishes "non-deterministic" from "dynamic" via two
independent operator flags, although the naming is non-obvious:
||Operator||{{isDeterministic()}}||{{isDynamicFunction()}}||Semantics||
|{{RAND()}}|false|false|Different value per call site, even within one
statement|
|{{CURRENT_TIMESTAMP}}|true (default; not overridden)|true|Same value at every
call site within one statement; can differ across executions|
So in Calcite's model {{CURRENT_TIMESTAMP}} is "deterministic per statement" —
duplicating its call sites is semantically harmless because all occurrences
resolve to the same instant at runtime. The {{isDynamicFunction()}} flag
governs plan-cache invalidation across executions, not duplication safety
within one execution.
The fix uses {{RexUtil.isDeterministic(...)}}, which already encodes this
distinction correctly: {{RAND}} is blocked from duplication;
{{CURRENT_TIMESTAMP}} (and the rest of {{SqlAbstractTimeFunction}}) is left
alone. No refinement
of Calcite's determinism notion is required for this fix, although the comment
thread notes that the two-flag model is worth revisiting more broadly — that is
out of scope here.
h3. Proposed fix
Four small changes, all in the spirit of the existing "skip rule if it would
touch non-determinism" pattern already used elsewhere in Calcite
({{ProjectReduceExpressionsRule}}, the {{containsOver}} skip in transpose
rules, etc.):
{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
bottom slot; if any bottom expression is non-deterministic
({{!RexUtil.isDeterministic(...)}}) and is referenced more than once, return
{{null}}. Fixes
{{ProjectMergeRule}}, {{FilterProjectTransposeRule}}'s helper call, and the
{{RelBuilder}} bloat merge in one place.
{{FilterProjectTransposeRule}} — additionally skip when any projected
expression is non-deterministic, parallel to the existing {{containsOver}}
skip. Required because the rule re-emits the original projection above the new
filter.
{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to
the existing {{containsOver}} skip on each side.
{{SemiJoinProjectTransposeRule}} — same skip as {{JoinProjectTransposeRule}}
(uses the same {{mergePrograms}} + {{expandLocalRef}} pattern).
was:
Several optimizer and RelBuilder-level transformations inline a
lower-projection expression into multiple positions of an upper expression
without checking whether the inlined expression is deterministic. When the
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
generators), this silently converts one evaluation into many, changing query
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always
{{{}1{}}}. The expected plan preserves the single evaluation by referencing the
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
h2. Affected code paths (confirmed by reproducer tests)
||Component||Mechanism||
|{{RelOptUtil.pushPastProjectUnlessBloat}}|The shuttle returned by
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying
projection. Only {{RexOver}} is guarded; determinism is not.|
|{{RelBuilder.project_}} (bloat merge)|Calls {{pushPastProjectUnlessBloat}}
when merging adjacent {{Project}}s. This is why the bug surfaces in the very
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
|{{ProjectMergeRule}}|Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
{{RAND()}} calls.|
|{{FilterProjectTransposeRule}}|Calls {{pushPastProjectUnlessBloat}} on the
filter condition, and re-emits the original projection above the new filter.
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
|{{JoinProjectTransposeRule}}|Builds {{RexProgram}}s for projection + join
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
back into a {{RexNode}} tree, inlining the underlying expression at every
occurrence. Test: 1 → 3.|
h2. Confirmed unaffected (by the same tests)
- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
projected expression is non-reducible.
h2. Proposed fix
Three small changes, all guarding the existing inline-substitution paths
against duplication of non-deterministic sub-expressions:
{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
bottom slot; if any bottom expression is non-deterministic
({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once,
return {{{}null{}}}. Fixes
{{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, and
the {{RelBuilder}} bloat merge in one place.
{{FilterProjectTransposeRule}} — additionally skip when any projected
expression is non-deterministic, parallel to the existing {{containsOver}}
skip. Required because the rule re-emits the original projection above the new
filter.
{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to
the existing {{containsOver}} skip on each side.
h2. Reproducer
A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.
> Project/Filter/Join transpose and merge rules can duplicate non-deterministic
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7551
> URL: https://issues.apache.org/jira/browse/CALCITE-7551
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.41.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Major
>
> Several optimizer and {{RelBuilder}}-level transformations inline a
> lower-projection expression into multiple positions of an upper expression
> without checking whether the inlined expression is non-deterministic. When
> the lower
> expression is non-deterministic in the strict sense (i.e. two calls within
> the same statement may return different values — {{RAND()}}, sequence
> {{NEXTVAL}}, UUID generators), this silently converts one evaluation into
> many,
> changing query semantics.
> {{CURRENT_TIMESTAMP}} and the other time functions are not in scope: Calcite
> considers them deterministic per statement (see the "Note on dynamic vs
> non-deterministic" section below) and they are safe to duplicate.
> h3. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> h3. Current plan produced by {{SqlToRelConverter}}:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two {{RAND()}} calls are evaluated independently, so {{B - A}} is no
> longer always {{1}}. The expected plan preserves the single evaluation by
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h3. Affected code paths (confirmed by reproducer tests)
> ||Component||Mechanism||
> |{{RelOptUtil.pushPastProjectUnlessBloat}}|The shuttle returned by
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> |{{RelBuilder.project_}} (bloat merge)|Calls {{pushPastProjectUnlessBloat}}
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> |{{ProjectMergeRule}}|Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
> {{RAND()}} calls.|
> |{{FilterProjectTransposeRule}}|Calls {{pushPastProjectUnlessBloat}} on the
> filter condition, and re-emits the original projection above the new filter.
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> |{{JoinProjectTransposeRule}}|Builds {{RexProgram}}s for projection + join
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
> back into a {{RexNode}} tree, inlining the underlying expression at every
> occurrence. Test: 1 → 3.|
> |{{SemiJoinProjectTransposeRule}}|Same mechanism as
> {{JoinProjectTransposeRule}} — uses {{RexProgramBuilder.mergePrograms}} +
> {{expandLocalRef}} to inline projected expressions into the semi-join
> condition. Test: 1 → 2.|
> h3. Confirmed unaffected (by the same tests)
> - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
> referenced multiple times. Sharing is preserved.
> - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
> - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
> projected expression is non-reducible.
> - {{ProjectFilterTransposeRule}}, {{ProjectJoinTransposeRule}},
> {{ProjectCorrelateTransposeRule}}, {{AggregateProjectMergeRule}} — confirmed
> safe on tested patterns.
> - {{FilterSetOpTransposeRule}} — textual count of {{RAND(}} doubles when the
> filter is cloned to each {{UNION ALL}} branch, but this is NOT a semantic
> bug: any single row only flows through one branch's filter, so per-row
> evaluation count is unchanged.
> h3. Note on dynamic vs non-deterministic ({{CURRENT_TIMESTAMP}})
> Calcite already distinguishes "non-deterministic" from "dynamic" via two
> independent operator flags, although the naming is non-obvious:
> ||Operator||{{isDeterministic()}}||{{isDynamicFunction()}}||Semantics||
> |{{RAND()}}|false|false|Different value per call site, even within one
> statement|
> |{{CURRENT_TIMESTAMP}}|true (default; not overridden)|true|Same value at
> every call site within one statement; can differ across executions|
> So in Calcite's model {{CURRENT_TIMESTAMP}} is "deterministic per statement"
> — duplicating its call sites is semantically harmless because all occurrences
> resolve to the same instant at runtime. The {{isDynamicFunction()}} flag
> governs plan-cache invalidation across executions, not duplication safety
> within one execution.
> The fix uses {{RexUtil.isDeterministic(...)}}, which already encodes this
> distinction correctly: {{RAND}} is blocked from duplication;
> {{CURRENT_TIMESTAMP}} (and the rest of {{SqlAbstractTimeFunction}}) is left
> alone. No refinement
> of Calcite's determinism notion is required for this fix, although the
> comment thread notes that the two-flag model is worth revisiting more broadly
> — that is out of scope here.
> h3. Proposed fix
> Four small changes, all in the spirit of the existing "skip rule if it would
> touch non-determinism" pattern already used elsewhere in Calcite
> ({{ProjectReduceExpressionsRule}}, the {{containsOver}} skip in transpose
> rules, etc.):
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
> bottom slot; if any bottom expression is non-deterministic
> ({{!RexUtil.isDeterministic(...)}}) and is referenced more than once, return
> {{null}}. Fixes
> {{ProjectMergeRule}}, {{FilterProjectTransposeRule}}'s helper call, and the
> {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected
> expression is non-deterministic, parallel to the existing {{containsOver}}
> skip. Required because the rule re-emits the original projection above the
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel
> to the existing {{containsOver}} skip on each side.
> {{SemiJoinProjectTransposeRule}} — same skip as {{JoinProjectTransposeRule}}
> (uses the same {{mergePrograms}} + {{expandLocalRef}} pattern).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)