[
https://issues.apache.org/jira/browse/CALCITE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089864#comment-18089864
]
Weihua Zhang commented on CALCITE-7608:
---------------------------------------
The proposed {{LogicalSelectMany}} looks very close to [Velox's
{{UnnestNode}}|https://facebookincubator.github.io/velox/develop/operators.html#unnestnode],
which may be a useful reference since it already solves the exact problem
described here (avoiding the {{Uncollect}} + {{LogicalCorrelate}} pairing).
h3. Field-level correspondence
||Concept||LogicalSelectMany (proposed)||Velox UnnestNode||Calcite Uncollect
(today)||
|Columns passed through unchanged|"preserve the other
fields"|replicateVariables|not supported (needs an outer Correlate)|
|Columns to expand|SOME fields of the input|unnestVariables (array/map)|ALL
fields|
|Expand multiple columns|yes|yes|yes|
|Multi-collection row count|zip-longest, pad with NULL (CALCITE-7583)|max
cardinality, pad with NULL|zip after CALCITE-7583|
|Ordinality|WITH ORDINALITY|ordinalityName|WITH ORDINALITY|
|MAP expansion|key/value (2 cols)|key/value (2 cols)|key/value|
The key match is {{replicateVariables}} == the "preserve the other fields"
capability of {{LogicalSelectMany}}. This is precisely what removes the need
for the surrounding {{LogicalCorrelate}}: the pass-through columns are carried
by the operator itself instead of being threaded through a correlation
variable. Velox's physical execution ("replicate the pass-through columns x the
expanded rows") maps directly onto the Enumerable {{SelectMany}} semantics
referenced in this issue.
h3. One detail from Velox worth considering
{{UnnestNode}} has an {{emptyUnnestValueName}} property that controls the
empty-collection behavior:
* not set -> rows whose collection is empty produce no output (INNER semantics);
* set -> an output row is still produced for an empty collection (expanded
columns NULL, {{emptyUnnestValue = true}}), i.e. OUTER unnest; when ordinality
is also present, that row's ordinality is 0.
This corresponds to the SQL distinction between {{CROSS JOIN UNNEST}} and
{{LEFT JOIN UNNEST ... ON TRUE}} (Trino/Presto OUTER UNNEST). If we want
{{LogicalSelectMany}} to cover OUTER unnest as well, it may be worth exposing a
similar flag; otherwise it can only express the INNER form.
h3. Note on layering
Velox's {{UnnestNode}} is a _physical_ operator, whereas {{LogicalSelectMany}}
is a _logical_ one. The direct counterpart of {{UnnestNode}} would be the
physical {{EnumerableSelectMany}}; the pair {{LogicalSelectMany}} ->
{{EnumerableSelectMany}} is what lines up with Velox. This also suggests a
clean push-down path for engines built on a Velox-style execution model.
> Introduce a SelectMany operator
> -------------------------------
>
> Key: CALCITE-7608
> URL: https://issues.apache.org/jira/browse/CALCITE-7608
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.42.0
> Reporter: Mihai Budiu
> Assignee: Mihai Budiu
> Priority: Minor
>
> Today UNNEST is implemented using the Uncollect operator. We propose adding
> an alternative LogicalSelectMany operator, which generalizes Uncollect.
> (Notice that Enumerable API already has a SelectMany.) The main difference
> between Uncollect and SelectMany is that Uncollect unnests all the fields of
> its input relation, whereas LogicalSelectMany would only unnest SOME of the
> fields of the input collection, preserving the other ones in each output row.
> This distinction is very important, because:
> * LogicalSelectMany can be directly and efficiently implemented using the
> Enumerable SelectMany
> * UNNEST used in a cross-join is implemented using an Uncollect and a
> LogicalCorrelate. However, the same UNNEST can be represented using just one
> LogicalSelectMany node
> * Neither the old nor the new decorrelator can actually eliminate
> LogicalCorrelate nodes that are paired with Uncollect. Using
> LogicalSelectMany we can decorrelate many more plans.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)