[
https://issues.apache.org/jira/browse/CALCITE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruben Q L updated CALCITE-6338:
---
Description:
{{RelMdCollation#project}} can return an incomplete list of collations.
Let us say we have a Project (or a Calc) that projects the following
expressions (notice that $2 will become $1 and $2 after the projection): $0,
$2, $2, $3
The Project's input has collation [2, 3]
In order to calculate the Project's own collation, {{RelMdCollation#project}}
will be called, and a MultiMap targets will be computed because, as in this
case, a certain "source field" (e.g. 2) can have multiple project targets (e.g.
1 and 2). However, when the collation is being computed, *only the first target
will be considered* (and the rest will be discarded):
{code}
public static @Nullable List project(RelMetadataQuery mq,
RelNode input, List projects) {
...
for (RelFieldCollation ifc : ic.getFieldCollations()) {
final Collection integers = targets.get(ifc.getFieldIndex());
if (integers.isEmpty()) {
continue loop; // cannot do this collation
}
fieldCollations.add(ifc.withFieldIndex(integers.iterator().next()));
// <-- HERE!!
}
{code}
Because of this, the Project's collation will be [1 3], but there is also
another valid one ([2 3]), so the correct (complete) result should be: [1 3] [2
3]
This seems a minor problem, but it can be the root cause of more relevant
issues. For instance, at the moment I have a scenario (not so easy to reproduce
with a unit test) where a certain plan with a certain combination of rules in a
HepPlanner results in a StackOverflow due to SortJoinTransposeRule being fired
infinitely. The root cause is that, after the first application, the rule does
not detect that the Join's left input is already sorted (due to the previous
application of the rule), because there is a "problematic" Project on it (that
shows the problem described above), which returns only one collation, whereas
the second collation (the one being discarded) is the Sort's collation, so it
would be one that would prevent the SortJoinTransposeRule from being re-applied
over and over.
was:
{{RelMdCollation#project}} can return an incomplete list of collations.
Let us say we have a Project that projects the following expressions (notice
that $2 will become $1 and $2 after the projection): $0, $2, $2, $3
The Project's input has collation [2, 3]
In order to calculate the Project's own collation, {{RelMdCollation#project}}
will be called, and a MultiMap targets will be computed because, as in this
case, a certain "source field" (e.g. 2) can have multiple project targets (e.g.
1 and 2). However, when the collation is being computed, *only the first target
will be considered* (and the rest will be discarded):
{code}
public static @Nullable List project(RelMetadataQuery mq,
RelNode input, List projects) {
...
for (RelFieldCollation ifc : ic.getFieldCollations()) {
final Collection integers = targets.get(ifc.getFieldIndex());
if (integers.isEmpty()) {
continue loop; // cannot do this collation
}
fieldCollations.add(ifc.withFieldIndex(integers.iterator().next()));
// <-- HERE!!
}
{code}
Because of this, the Project's collation will be [1 3], but there is also
another valid one ([2 3]), so the correct (complete) result should be: [1 3] [2
3]
This seems a minor problem, but it can be the root cause of more relevant
issues. For instance, at the moment I have a scenario (not so easy to reproduce
with a unit test) where a certain plan with a certain combination of rules in a
HepPlanner results in a StackOverflow due to SortJoinTransposeRule being fired
infinitely. The root cause is that, after the first application, the rule does
not detect that the Join's left input is already sorted (due to the previous
application of the rule), because there is a "problematic" Project on it (that
shows the problem described above), which returns only one collation, whereas
the second collation (the one being discarded) is the Sort's collation, so it
would be one that would prevent the SortJoinTransposeRule from being re-applied
over and over.
> RelMdCollation#project can return an incomplete list of collations in the
> presence of aliasing
> --
>
> Key: CALCITE-6338
> URL: https://issues.apache.org/jira/browse/CALCITE-6338
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.36.0
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.37.0
>
>
> {{RelMdCollation#project}} can return an incomplete list