[
https://issues.apache.org/jira/browse/CALCITE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047183#comment-18047183
]
Silun Dong commented on CALCITE-7340:
-------------------------------------
I'm not sure if I fully understand your solution, here's my personal
understanding:
It seems to me that CorrelationId and scope are closely related. The
variablesSet property of Project and Filter usually contains either 0 or 1
element (I haven’t encountered other cases so far), if there is one
CorrelationId, this id represents a variable coming from the input that is
referenced by expressions (for my understanding of the Join, please refer to
the [discussion in
github|https://github.com/apache/calcite/pull/4691#discussion_r2635417739] over
CALCITE-7336). In other words, that scope corresponds to a unique
CorrelationId. This idea is also confirmed by the logic of SqlToRelConverter:
when generating Project/Filter, it collects the CorrelationIds used by the
node’s expressions. If an id belongs to the current scope, it will be added to
the variablesSet property; otherwise, the variablesSet is empty, meaning the
free variable belongs to an outer scope.
Regarding the use of CorrelationId, my main focus is on subquery removal and
decorrelation. Taking Project and Filter as examples: when removing subqueries,
we collect the CorrelationIds used in the node’s expressions and check whether
they intersect with the variablesSet. If there is an intersection, the
expression is correlated with the current scope and a Correlate is produced;
otherwise, the expression is not correlated with the current scope and a Join
is produced (the Correlate has already been generated in an outer scope). The
decorrelation algorithm (at least the new one) is also based on this and can
correctly handle complex nested correlations.
In my opinion, during the plan-rewrite phase, it's good for each scope to
correspond to a unique CorrelationId (in the physical implementation phase,
perhaps refer to the EnumerableBatchNestedLoopJoinRule, which will not be
discussed here). If this concept is changed, subquery removal and decorrelation
would likely be heavily affected.
Perhaps others have better insights; this is just for reference.
> The rules governing the use of CorrelationId values in plans are not fully
> specified
> ------------------------------------------------------------------------------------
>
> Key: CALCITE-7340
> URL: https://issues.apache.org/jira/browse/CALCITE-7340
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Mihai Budiu
> Priority: Minor
>
> This issue is really about the Calcite internal representation of Rel nodes.
> There have been several recent discussions about manipulating plans that
> contain CorrelationId values, and the conclusion seems to be that the rules
> governing the use of such variables is not clear.
> Ideally these rules should be spelled out in a specification, and there
> should be a tool to enforce them by validating plans. The JavaDoc for this
> tool may be the right place to write the specification. I don't expect that
> the specification will be long or complicated.
> RelBuilder may not be the right place to enforce such rules, because it
> usually does not have visibility over the entire plan, and some of these
> rules have to apply globally over entire plans.
> See CALCITE-5784, CALCITE-7045 and the discussion in github over CALCITE-7336
> for examples.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)