[ 
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083704#comment-18083704
 ] 

Mihai Budiu commented on CALCITE-7551:
--------------------------------------

There are a few other optimizations which will check for non-deterministic 
functions and do nothing if any is found.

But I want to point out that RAND() and CURRENT_TIMESTAMP() are different: the 
latter is non-deterministic, but all calls within the same transaction are 
supposed to return the same value, so it should be safe to duplicate. I think 
that Calcite needs some refined notions of non-determinism.

> Project/Filter/Join transpose and merge rules can duplicate non-deterministic 
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7551
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7551
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.41.0
>            Reporter: Darpan Lunagariya (e6data computing)
>            Assignee: Darpan Lunagariya (e6data computing)
>            Priority: Major
>
> h2. Description
> Several optimizer and RelBuilder-level transformations inline a 
> lower-projection expression into multiple positions of an upper expression 
> without checking whether the inlined expression is deterministic. When the 
> lower expression
> is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID 
> generators), this silently converts one evaluation into many, changing query 
> semantics.
> h2. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> Current plan produced by SqlToRelConverter:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two RAND() calls are evaluated independently, so B - A is no longer 
> always {{{}1{}}}. The expected plan preserves the single evaluation by 
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
>     LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h2. Affected code paths (confirmed by reproducer tests)
> ||Component||Mechanism||
> │{{RelOptUtil.pushPastProjectUnlessBloat}}│The shuttle returned by 
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the 
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> │{{RelBuilder.project_}} (bloat merge)│Calls {{pushPastProjectUnlessBloat}} 
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very 
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> │{{ProjectMergeRule}}│Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2 
> {{RAND()}} calls.|
> │{{FilterProjectTransposeRule}}│Calls {{pushPastProjectUnlessBloat}} on the 
> filter condition, and re-emits the original projection above the new filter. 
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of 
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> │{{JoinProjectTransposeRule}}│Builds {{RexProgram}}s for projection + join 
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls 
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local 
> refs back into a {{RexNode}} tree, inlining the underlying expression at 
> every occurrence. Test: 1 → 3.|
> Confirmed unaffected (by the same tests)
>  - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
> referenced multiple times. Sharing is preserved; useful property to think 
> about
> when designing a general fix.
>  - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
>  - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
> projected expression is non-reducible.
> h2. Proposed fix
> Three small changes, all guarding the existing inline-substitution paths 
> against duplication of non-deterministic sub-expressions:
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
> bottom slot; if any bottom expression is non-deterministic 
> ({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once, 
> return {{{}null{}}}. Fixes 
> {{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, 
> and the {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected 
> expression is non-deterministic, parallel to the existing {{containsOver}} 
> skip. Required because the rule re-emits the original projection above the 
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel 
> to the existing {{containsOver}} skip on each side.
> h2. Reproducer
> A new test class {{NonDeterministicDuplicationTest}} builds each rule's input 
> programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time 
> merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, 
> and
> asserts the count of {{RAND(}} occurrences in the output plan does not 
> increase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to