[
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) updated CALCITE-7551:
----------------------------------------------------------
Description:
h2. Description
Several optimizer and RelBuilder-level transformations inline a
lower-projection expression into multiple positions of an upper expression
without checking whether the inlined expression is deterministic. When the
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
generators), this silently converts one evaluation into many, changing query
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always
{{{}1{}}}. The expected plan preserves the single evaluation by referencing the
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
h2. Affected code paths (confirmed by reproducer tests)
||Component||Mechanism||
│{{RelOptUtil.pushPastProjectUnlessBloat}}│ The shuttle returned by
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying
projection. Only {{RexOver}} is guarded; determinism is not.|
│{{RelBuilder.project_}} (bloat merge)│ Calls {{pushPastProjectUnlessBloat}}
when merging adjacent {{Project}}s. This is why the bug surfaces in the very
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
│{{ProjectMergeRule}}│ Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
{{RAND()}} calls.|
│{{FilterProjectTransposeRule}}│ Calls {{pushPastProjectUnlessBloat}} on the
filter condition, and re-emits the original projection above the new filter.
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
│{{JoinProjectTransposeRule}}│ Builds {{RexProgram}}s for projection + join
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
back into a {{RexNode}} tree, inlining the underlying expression at every
occurrence. Test: 1 → 3.|
Confirmed unaffected (by the same tests)
- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
projected expression is non-reducible.
h2. Proposed fix
Three small changes, all guarding the existing inline-substitution paths
against duplication of non-deterministic sub-expressions:
{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
bottom slot; if any bottom expression is non-deterministic
({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once,
return {{{}null{}}}. Fixes
{{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, and
the {{RelBuilder}} bloat merge in one place.
{{FilterProjectTransposeRule}} — additionally skip when any projected
expression is non-deterministic, parallel to the existing {{containsOver}}
skip. Required because the rule re-emits the original projection above the new
filter.
{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to
the existing {{containsOver}} skip on each side.
h2. Reproducer
A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.
||Heading 1||Heading 2||
|Col A1|Col A2|
was:
h2. Description
Several optimizer and RelBuilder-level transformations inline a
lower-projection expression into multiple positions of an upper expression
without checking whether the inlined expression is deterministic. When the
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
generators), this silently converts one evaluation into many, changing query
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always
{{{}1{}}}. The expected plan preserves the single evaluation by referencing the
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
h2. Affected code paths (confirmed by reproducer tests)
||Component||Mechanism||
│{{RelOptUtil.pushPastProjectUnlessBloat}}│ The shuttle returned by
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying
projection. Only {{RexOver}} is guarded; determinism is not.|
│{{RelBuilder.project_}} (bloat merge)│ Calls {{pushPastProjectUnlessBloat}}
when merging adjacent {{Project}}s. This is why the bug surfaces in the very
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
│{{ProjectMergeRule}}│ Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
{{RAND()}} calls.|
│{{FilterProjectTransposeRule}}│ Calls {{pushPastProjectUnlessBloat}} on the
filter condition, and re-emits the original projection above the new filter.
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
│{{JoinProjectTransposeRule}}│ Builds {{RexProgram}}s for projection + join
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
back into a {{RexNode}} tree, inlining the underlying expression at every
occurrence. Test: 1 → 3.|
Confirmed unaffected (by the same tests)
- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
projected expression is non-reducible.
h2. Proposed fix
Three small changes, all guarding the existing inline-substitution paths
against duplication of non-deterministic sub-expressions:
{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
bottom slot; if any bottom expression is non-deterministic
({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once,
return {{{}null{}}}. Fixes
{{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, and
the {{RelBuilder}} bloat merge in one place.
{{FilterProjectTransposeRule}} — additionally skip when any projected
expression is non-deterministic, parallel to the existing {{containsOver}}
skip. Required because the rule re-emits the original projection above the new
filter.
{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to
the existing {{containsOver}} skip on each side.
h2. Reproducer
A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.
> Project/Filter/Join transpose and merge rules can duplicate non-deterministic
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7551
> URL: https://issues.apache.org/jira/browse/CALCITE-7551
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.41.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Major
>
> h2. Description
> Several optimizer and RelBuilder-level transformations inline a
> lower-projection expression into multiple positions of an upper expression
> without checking whether the inlined expression is deterministic. When the
> lower expression
> is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
> generators), this silently converts one evaluation into many, changing query
> semantics.
> h2. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> Current plan produced by SqlToRelConverter:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two RAND() calls are evaluated independently, so B - A is no longer
> always {{{}1{}}}. The expected plan preserves the single evaluation by
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h2. Affected code paths (confirmed by reproducer tests)
> ||Component||Mechanism||
> │{{RelOptUtil.pushPastProjectUnlessBloat}}│ The shuttle returned by
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> │{{RelBuilder.project_}} (bloat merge)│ Calls {{pushPastProjectUnlessBloat}}
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> │{{ProjectMergeRule}}│ Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
> {{RAND()}} calls.|
> │{{FilterProjectTransposeRule}}│ Calls {{pushPastProjectUnlessBloat}} on the
> filter condition, and re-emits the original projection above the new filter.
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> │{{JoinProjectTransposeRule}}│ Builds {{RexProgram}}s for projection + join
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local
> refs back into a {{RexNode}} tree, inlining the underlying expression at
> every occurrence. Test: 1 → 3.|
> Confirmed unaffected (by the same tests)
> - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
> referenced multiple times. Sharing is preserved; useful property to think
> about
> when designing a general fix.
> - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
> - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
> projected expression is non-reducible.
> h2. Proposed fix
> Three small changes, all guarding the existing inline-substitution paths
> against duplication of non-deterministic sub-expressions:
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
> bottom slot; if any bottom expression is non-deterministic
> ({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once,
> return {{{}null{}}}. Fixes
> {{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call,
> and the {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected
> expression is non-deterministic, parallel to the existing {{containsOver}}
> skip. Required because the rule re-emits the original projection above the
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel
> to the existing {{containsOver}} skip on each side.
> h2. Reproducer
> A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
> programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
> merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}},
> and
> asserts the count of {{RAND(}} occurrences in the output plan does not
> increase.
> ||Heading 1||Heading 2||
> |Col A1|Col A2|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)