[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835893#comment-17835893
 ] 

Julian Hyde commented on CALCITE-6357:
--------------------------------------

*Caveat*: I haven't read every word you've written above; I only scanned the 
Beam case. I'm correcting what seem to be mistaken assumptions, in the hope 
that it will allow you to diagnose your problem faster. I hope that I am not 
dissuading other Calcite community members who may have more time from jumping 
in to help.

{quote}I presume you would agree that names of output columns is as much part 
of data integrity as the values{quote}

No, I would not. Calcite does not commit to preserving column names, only their 
types and ordering. It recognizes duplicate relational expressions (via 
memoization), forms equivalence sets of relational expressions, and after 
optimization will return one of the relational expressions in that subset.

{quote}isn't it a valid case to have a row with wider schema from what you 
actually need to select?{quote}

Sure, you can write "select x, y from aTableWithAHundredColumns". That's a 
Project (with two expressions) on a Scan (returning 100 columns). My point is 
that the Project knows that its input has 100 columns.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6357
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6357
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Brachi Packter
>            Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List<? extends RexNode> exps,
>       RelDataType inputRowType) {
>     return inputRowType.getFieldCount() == exps.size()
>         && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to