[ 
https://issues.apache.org/jira/browse/CALCITE-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293845#comment-14293845
 ] 

Vladimir Sitnikov commented on CALCITE-569:
-------------------------------------------

{quote}If the input has collations (a, b), (x, p), (x, y, z) and we're looking 
for (x, y) then we shouldn't stop at (x, p), should we?{quote}
We should not.
It collects all the collations. For this case it would be (x) and (x,y).
Then at the usage side it picks {{collationList.get(0)}} (the first collation). 
I did not change that {{.get(0)}} part.

{quote}. LogicalProject(x, y) on LogicalSort( y ) will indeed be sorted on y - 
but any code that creates a RelNode subclass can override.
{quote}
Note: it does not matter how you propagate the collation. The result/impact is 
the same no matter if you use PROPAGATE or you compute the  columns (see below).

That has a flip side:
1) if you state that LogicalProject has "y" collation, then you basically 
disallow physical implementations that violate "y" collation.
For instance {{Sort(y, Project(y, Sort(y)))}}.
If Project has collation, then SortRemoveRule transforms the plan to Project(y, 
Sort(y)). If the implementation picks non-collation-preserving strategy (like 
ParallelProject(y, Sort(y))) then we are in trouble since noone will 
reestablish the "required" sort order of the rows.

Are you sure all the implementations really treat "collation of a project" as a 
"must"?
Can you show how EnumerableCalc/Project honors the collation?

2) We might miss some optimizations. When we propagate the collations we stick 
with just the subset of plans that keep the collation intact.
If we allow shuffle-like plans, there is a possibility the plan can be more 
efficient (e.g. use hash-join instead of merge-join, etc).

The "uniform" way to solve the problem would be to add both kinds of Projects 
into the equivalence set (e.g. some sort of rules that replace a collated 
Project with Sort(NonCollatedProject)).
It is not very trivial patch, so I suppose to go with the first approach: "kill 
collations from LogicalProject while still allow to implement it in Physical 
nodes".

> ArrayIndexOutOfBoundsException when deducing collation
> ------------------------------------------------------
>
>                 Key: CALCITE-569
>                 URL: https://issues.apache.org/jira/browse/CALCITE-569
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.0.0-incubating
>            Reporter: Aman Sinha
>            Assignee: Julian Hyde
>         Attachments: 
> 0001-CALCITE-569-Create-a-Project-with-empty-collation-if.patch, 
> 0001-Properly-track-collation-trait-for-select-a-from-.-o.patch, 
> 0001-Properly-track-collation-trait-for-select-a-from-.-o.patch, 
> PlanTest.java.diff
>
>
> If a subquery has an ORDER BY on a column that is not in the SELECT list and 
> the outer query does another ORDER BY,  Calcite encounters an 
> ArrayIndexOutOfBoundException when deducing collation. 
> In PlannerTest, I created a simple test by first adding the following traits: 
>  {code}
>           List<RelTraitDef> traitDefs = new ArrayList<RelTraitDef>();
>           traitDefs.add(ConventionTraitDef.INSTANCE);
>           traitDefs.add(RelCollationTraitDef.INSTANCE);
> {code}
> And ran the following query: 
> {code}
> select t.psPartkey from (select ps.psPartkey from `tpch`.`partsupp` ps order 
> by ps.psSupplyCost) t order by t.psPartkey"
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: -1
>       at 
> org.apache.calcite.rex.RexProgram.deduceCollations(RexProgram.java:589)
>       at org.apache.calcite.rex.RexProgram.getCollations(RexProgram.java:558)
>       at 
> org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2685)
>       at 
> org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2623)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3571)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:613)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:568)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2929)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:526)
>       at org.apache.calcite.prepare.PlannerImpl.convert(PlannerImpl.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to