[ https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835858#comment-17835858 ]
Brachi Packter edited comment on CALCITE-6357 at 4/10/24 7:09 PM: ------------------------------------------------------------------ > If the number of fields does not match, that's probably a problem on your > end. RelBuilder almost always requires number of fields to match. why? isn't it a valid case to have a row with wider schema from you actually need to select? (e.g group by queries, select one dimension from the row and make some count/sum on it) > At RelBuilder#2125 it seems that force is false. For the behavior you want, > force would need to be true can't see where I can pass force, only here https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063 but it looks like it should be false in order to be renamed later on (and is identical should return true) was (Author: brachi_packter): > If the number of fields does not match, that's probably a problem on your > end. RelBuilder almost always requires number of fields to match. why? isn't it a valid case to have a row with wider schema from you actually need to select? (e.g group by queries, select one dimension from the row and make some count/sum on it) > Calcite enforces select arguments count to be same as row schema fields which > causes aliases to be ignored > ---------------------------------------------------------------------------------------------------------- > > Key: CALCITE-6357 > URL: https://issues.apache.org/jira/browse/CALCITE-6357 > Project: Calcite > Issue Type: Bug > Reporter: Brachi Packter > Priority: Major > > Calcite RelBuilder.ProjectNamed checks if row size in the select is identical > to schema fields, if no, it creates a project with fields as they appear in > the select , meaning if they have aliases, they are returning with their > aliases. > Here, it checks if they are identical: > https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063 > using RexUtil.isIdentity method: > ``` > public static boolean isIdentity(List<? extends RexNode> exps, > RelDataType inputRowType) { > return inputRowType.getFieldCount() == exps.size() > && containIdentity(exps, inputRowType, Litmus.IGNORE); > } > ``` > This is the problematic part `inputRowType.getFieldCount() == exps.size()` > If they are identical, and return with their aliases, it is ignored in the > "rename" method later on > https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125 > and alias is skipped > https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137 > This doesn't impact calcite queries, but in Apache Beam they are doing some > optimization on top of it, > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java > which causes aliases to be ignored, and data is returning suddenly without > correct column field. > I believe the isIdentity check can causes more issues if not fixed, we need > to understand why is it enforced? isn't it valid to have different size of > fields in select from what we have in the schema? > In our case we have a one big row and we run on it different queries, each > with different fields in the select. > Beam issue > https://github.com/apache/beam/issues/30498 -- This message was sent by Atlassian Jira (v8.20.10#820010)