[
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083704#comment-18083704
]
Mihai Budiu commented on CALCITE-7551:
--------------------------------------
There are a few other optimizations which will check for non-deterministic
functions and do nothing if any is found.
But I want to point out that RAND() and CURRENT_TIMESTAMP() are different: the
latter is non-deterministic, but all calls within the same transaction are
supposed to return the same value, so it should be safe to duplicate. I think
that Calcite needs some refined notions of non-determinism.
> Project/Filter/Join transpose and merge rules can duplicate non-deterministic
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7551
> URL: https://issues.apache.org/jira/browse/CALCITE-7551
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.41.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Major
>
> h2. Description
> Several optimizer and RelBuilder-level transformations inline a
> lower-projection expression into multiple positions of an upper expression
> without checking whether the inlined expression is deterministic. When the
> lower expression
> is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
> generators), this silently converts one evaluation into many, changing query
> semantics.
> h2. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> Current plan produced by SqlToRelConverter:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two RAND() calls are evaluated independently, so B - A is no longer
> always {{{}1{}}}. The expected plan preserves the single evaluation by
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h2. Affected code paths (confirmed by reproducer tests)
> ||Component||Mechanism||
> │{{RelOptUtil.pushPastProjectUnlessBloat}}│The shuttle returned by
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> │{{RelBuilder.project_}} (bloat merge)│Calls {{pushPastProjectUnlessBloat}}
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> │{{ProjectMergeRule}}│Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2
> {{RAND()}} calls.|
> │{{FilterProjectTransposeRule}}│Calls {{pushPastProjectUnlessBloat}} on the
> filter condition, and re-emits the original projection above the new filter.
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> │{{JoinProjectTransposeRule}}│Builds {{RexProgram}}s for projection + join
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local
> refs back into a {{RexNode}} tree, inlining the underlying expression at
> every occurrence. Test: 1 → 3.|
> Confirmed unaffected (by the same tests)
> - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
> referenced multiple times. Sharing is preserved; useful property to think
> about
> when designing a general fix.
> - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
> - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
> projected expression is non-reducible.
> h2. Proposed fix
> Three small changes, all guarding the existing inline-substitution paths
> against duplication of non-deterministic sub-expressions:
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
> bottom slot; if any bottom expression is non-deterministic
> ({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once,
> return {{{}null{}}}. Fixes
> {{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call,
> and the {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected
> expression is non-deterministic, parallel to the existing {{containsOver}}
> skip. Required because the rule re-emits the original projection above the
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel
> to the existing {{containsOver}} skip on each side.
> h2. Reproducer
> A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
> programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
> merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}},
> and
> asserts the count of {{RAND(}} occurrences in the output plan does not
> increase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)