TL;DR
Both top-down physical TraitSet request and bottom-up TraitSet
derivation have their strongth and weakness, we propose 
on-demand TraitSet request to combine the above two, to reduce
the number of plan alternatives that are genereated, especially 
in distributed system.

e.g.
select * from foo join bar on f1=b1 and f2=b2 and f3=b3;

In non-distributed system, we can generate a sort merge join, 
requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.  
But if foo happens to be sorted by f3,f2,f1, we may miss the 
chance of making use of the delivered ordering of foo. Because
if we require bar to be sorted by b3,b2,b1, we don't need to
sort on foo anymore. There are so many choices, n!, not even
considering asc/desc and null direction. We can't request all
the possible traitsets in top-down way, and can't derive all the
possible traitsets in bottom-up way either.

We propose on-demand traitset request by adding a new type
of metadata DerivedTraitSets into the built-in metadata system.

List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)

In this metadata, every operator returns several possbile traitsets
that may be derived from this operator.

Using above query as an example, the tablescan on foo should
return traiset with collation on f3, f2, f1.

In physical implementation rules, e.g. the SortMergeJoinRule,
it gets possible traitsets from both child operators, uses the join
keys to eliminate useless traitsets, leaves out usefull traitsets,
and requests corresponding traitset on the other child.

This relies on the feature of AbstractConverter, which is turned
off by default, due to performance issue [1].

Thoughts?

[1] https://issues.apache.org/jira/browse/CALCITE-2970

Haisheng

Reply via email to