[ 
https://issues.apache.org/jira/browse/CALCITE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089864#comment-18089864
 ] 

Weihua Zhang commented on CALCITE-7608:
---------------------------------------

The proposed {{LogicalSelectMany}} looks very close to [Velox's 
{{UnnestNode}}|https://facebookincubator.github.io/velox/develop/operators.html#unnestnode],
 which may be a useful reference since it already solves the exact problem 
described here (avoiding the {{Uncollect}} + {{LogicalCorrelate}} pairing).

h3. Field-level correspondence

||Concept||LogicalSelectMany (proposed)||Velox UnnestNode||Calcite Uncollect 
(today)||
|Columns passed through unchanged|"preserve the other 
fields"|replicateVariables|not supported (needs an outer Correlate)|
|Columns to expand|SOME fields of the input|unnestVariables (array/map)|ALL 
fields|
|Expand multiple columns|yes|yes|yes|
|Multi-collection row count|zip-longest, pad with NULL (CALCITE-7583)|max 
cardinality, pad with NULL|zip after CALCITE-7583|
|Ordinality|WITH ORDINALITY|ordinalityName|WITH ORDINALITY|
|MAP expansion|key/value (2 cols)|key/value (2 cols)|key/value|

The key match is {{replicateVariables}} == the "preserve the other fields" 
capability of {{LogicalSelectMany}}. This is precisely what removes the need 
for the surrounding {{LogicalCorrelate}}: the pass-through columns are carried 
by the operator itself instead of being threaded through a correlation 
variable. Velox's physical execution ("replicate the pass-through columns x the 
expanded rows") maps directly onto the Enumerable {{SelectMany}} semantics 
referenced in this issue.

h3. One detail from Velox worth considering

{{UnnestNode}} has an {{emptyUnnestValueName}} property that controls the 
empty-collection behavior:
* not set -> rows whose collection is empty produce no output (INNER semantics);
* set -> an output row is still produced for an empty collection (expanded 
columns NULL, {{emptyUnnestValue = true}}), i.e. OUTER unnest; when ordinality 
is also present, that row's ordinality is 0.

This corresponds to the SQL distinction between {{CROSS JOIN UNNEST}} and 
{{LEFT JOIN UNNEST ... ON TRUE}} (Trino/Presto OUTER UNNEST). If we want 
{{LogicalSelectMany}} to cover OUTER unnest as well, it may be worth exposing a 
similar flag; otherwise it can only express the INNER form.

h3. Note on layering

Velox's {{UnnestNode}} is a _physical_ operator, whereas {{LogicalSelectMany}} 
is a _logical_ one. The direct counterpart of {{UnnestNode}} would be the 
physical {{EnumerableSelectMany}}; the pair {{LogicalSelectMany}} -> 
{{EnumerableSelectMany}} is what lines up with Velox. This also suggests a 
clean push-down path for engines built on a Velox-style execution model.

> Introduce a SelectMany operator
> -------------------------------
>
>                 Key: CALCITE-7608
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7608
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Mihai Budiu
>            Assignee: Mihai Budiu
>            Priority: Minor
>
> Today UNNEST is implemented using the Uncollect operator. We propose adding 
> an alternative LogicalSelectMany operator, which generalizes Uncollect. 
> (Notice that Enumerable API already has a SelectMany.) The main difference 
> between Uncollect and SelectMany is that Uncollect unnests all the fields of 
> its input relation, whereas LogicalSelectMany would only unnest SOME of the 
> fields of the input collection, preserving the other ones in each output row.
> This distinction is very important, because:
>  * LogicalSelectMany can be directly and efficiently implemented using the 
> Enumerable SelectMany
>  * UNNEST used in a cross-join is implemented using an Uncollect and a 
> LogicalCorrelate. However, the same UNNEST can be represented using just one 
> LogicalSelectMany node
>  * Neither the old nor the new decorrelator can actually eliminate 
> LogicalCorrelate nodes that are paired with Uncollect. Using 
> LogicalSelectMany we can decorrelate many more plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to