[ 
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083710#comment-18083710
 ] 

Darpan Lunagariya (e6data computing) commented on CALCITE-7551:
---------------------------------------------------------------

Thanks [~mbudiu].

I updated the description accordingly by exploring more about CURRENT_TIMESTAMP.

> Project/Filter/Join transpose and merge rules can duplicate non-deterministic 
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7551
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7551
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.41.0
>            Reporter: Darpan Lunagariya (e6data computing)
>            Assignee: Darpan Lunagariya (e6data computing)
>            Priority: Major
>
> Several optimizer and {{RelBuilder}}-level transformations inline a 
> lower-projection expression into multiple positions of an upper expression 
> without checking whether the inlined expression is non-deterministic. When 
> the lower
> expression is non-deterministic in the strict sense (i.e. two calls within 
> the same statement may return different values — {{RAND()}}, sequence 
> {{NEXTVAL}}, UUID generators), this silently converts one evaluation into 
> many,
> changing query semantics.
> {{CURRENT_TIMESTAMP}} and the other time functions are not in scope: Calcite 
> considers them deterministic per statement (see the "Note on dynamic vs 
> non-deterministic" section below) and they are safe to duplicate.
> h3. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> h3. Current plan produced by {{SqlToRelConverter}}:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two {{RAND()}} calls are evaluated independently, so {{B - A}} is no 
> longer always {{1}}. The expected plan preserves the single evaluation by 
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
>     LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h3. Affected code paths (confirmed by reproducer tests)
> ||Component||Mechanism||
> |{{RelOptUtil.pushPastProjectUnlessBloat}}|The shuttle returned by 
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the 
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> |{{RelBuilder.project_}} (bloat merge)|Calls {{pushPastProjectUnlessBloat}} 
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very 
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> |{{ProjectMergeRule}}|Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2 
> {{RAND()}} calls.|
> |{{FilterProjectTransposeRule}}|Calls {{pushPastProjectUnlessBloat}} on the 
> filter condition, and re-emits the original projection above the new filter. 
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> |{{JoinProjectTransposeRule}}|Builds {{RexProgram}}s for projection + join 
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls 
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs
> back into a {{RexNode}} tree, inlining the underlying expression at every 
> occurrence. Test: 1 → 3.|
> |{{SemiJoinProjectTransposeRule}}|Same mechanism as 
> {{JoinProjectTransposeRule}} — uses {{RexProgramBuilder.mergePrograms}} + 
> {{expandLocalRef}} to inline projected expressions into the semi-join 
> condition. Test: 1 → 2.|
> h3. Confirmed unaffected (by the same tests)
> - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
> referenced multiple times. Sharing is preserved.
> - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
> - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
> projected expression is non-reducible.
> - {{ProjectFilterTransposeRule}}, {{ProjectJoinTransposeRule}}, 
> {{ProjectCorrelateTransposeRule}}, {{AggregateProjectMergeRule}} — confirmed 
> safe on tested patterns.
> - {{FilterSetOpTransposeRule}} — textual count of {{RAND(}} doubles when the 
> filter is cloned to each {{UNION ALL}} branch, but this is NOT a semantic 
> bug: any single row only flows through one branch's filter, so per-row
> evaluation count is unchanged.
> h3. Note on dynamic vs non-deterministic ({{CURRENT_TIMESTAMP}})
> Calcite already distinguishes "non-deterministic" from "dynamic" via two 
> independent operator flags, although the naming is non-obvious:
> ||Operator||{{isDeterministic()}}||{{isDynamicFunction()}}||Semantics||
> |{{RAND()}}|false|false|Different value per call site, even within one 
> statement|
> |{{CURRENT_TIMESTAMP}}|true (default; not overridden)|true|Same value at 
> every call site within one statement; can differ across executions|
> So in Calcite's model {{CURRENT_TIMESTAMP}} is "deterministic per statement" 
> — duplicating its call sites is semantically harmless because all occurrences 
> resolve to the same instant at runtime. The {{isDynamicFunction()}} flag
> governs plan-cache invalidation across executions, not duplication safety 
> within one execution.
> The fix uses {{RexUtil.isDeterministic(...)}}, which already encodes this 
> distinction correctly: {{RAND}} is blocked from duplication; 
> {{CURRENT_TIMESTAMP}} (and the rest of {{SqlAbstractTimeFunction}}) is left 
> alone. No refinement
> of Calcite's determinism notion is required for this fix, although the 
> comment thread notes that the two-flag model is worth revisiting more broadly 
> — that is out of scope here.
> h3. Proposed fix
> Four small changes, all in the spirit of the existing "skip rule if it would 
> touch non-determinism" pattern already used elsewhere in Calcite 
> ({{ProjectReduceExpressionsRule}}, the {{containsOver}} skip in transpose 
> rules, etc.):
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
> bottom slot; if any bottom expression is non-deterministic 
> ({{!RexUtil.isDeterministic(...)}}) and is referenced more than once, return 
> {{null}}. Fixes 
> {{ProjectMergeRule}}, {{FilterProjectTransposeRule}}'s helper call, and the 
> {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected 
> expression is non-deterministic, parallel to the existing {{containsOver}} 
> skip. Required because the rule re-emits the original projection above the 
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel 
> to the existing {{containsOver}} skip on each side.
> {{SemiJoinProjectTransposeRule}} — same skip as {{JoinProjectTransposeRule}} 
> (uses the same {{mergePrograms}} + {{expandLocalRef}} pattern).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to