[ https://issues.apache.org/jira/browse/CALCITE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829666#comment-17829666 ]
Julian Hyde commented on CALCITE-6338: -------------------------------------- Makes sense. I would mention 'in the presence of aliasing' (as in [aliasing|https://en.wikipedia.org/wiki/Aliasing_(computing)]) or something in the summary, and in your test case. I would like to see a test case where there are multiple aliased columns. If 1 and 2 are aliases, and 3 and 4 are aliases, therefore [1 3] [1 4] [2 3] [2 4] are equivalent collations. I don't like how in your implementation one line became 20. I used to understand that method, now I no longer do. Introduce abstractions so that the implementation is at most 2 or 3 lines. > RelMdCollation#project can return an incomplete list of collations > ------------------------------------------------------------------ > > Key: CALCITE-6338 > URL: https://issues.apache.org/jira/browse/CALCITE-6338 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.36.0 > Reporter: Ruben Q L > Assignee: Ruben Q L > Priority: Major > Labels: pull-request-available > Fix For: 1.37.0 > > > {{RelMdCollation#project}} can return an incomplete list of collations. > Let us say we have a Project that projects the following expressions (notice > that $2 will become $1 and $2 after the projection): $0, $2, $2, $3 > The Project's input has collation [2, 3] > In order to calculate the Project's own collation, {{RelMdCollation#project}} > will be called, and a MultiMap targets will be computed because, as in this > case, a certain "source field" (e.g. 2) can have multiple project targets > (e.g. 1 and 2). However, when the collation is being computed, *only the > first target will be considered* (and the rest will be discarded): > {code} > public static @Nullable List<RelCollation> project(RelMetadataQuery mq, > RelNode input, List<? extends RexNode> projects) { > ... > for (RelFieldCollation ifc : ic.getFieldCollations()) { > final Collection<Integer> integers = targets.get(ifc.getFieldIndex()); > if (integers.isEmpty()) { > continue loop; // cannot do this collation > } > fieldCollations.add(ifc.withFieldIndex(integers.iterator().next())); > // <-- HERE!! > } > {code} > Because of this, the Project's collation will be [1 3], but there is also > another valid one ([2 3]), so the correct (complete) result should be: [1 3] > [2 3] > This seems a minor problem, but it can be the root cause of more relevant > issues. For instance, at the moment I have a scenario (not so easy to > reproduce with a unit test) where a certain plan with a certain combination > of rules in a HepPlanner results in a StackOverflow due to > SortJoinTransposeRule being fired infinitely. The root cause is that, after > the first application, the rule does not detect that the Join's left input is > already sorted (due to the previous application of the rule), because there > is a "problematic" Project on it (that shows the problem described above), > which returns only one collation, whereas the second collation (the one being > discarded) is the Sort's collation, so it would be one that would prevent the > SortJoinTransposeRule from being re-applied over and over. -- This message was sent by Atlassian Jira (v8.20.10#820010)