If the general guideline is to not use composite/multiple traits then we should deprecate and remove them eventually.
Personally I haven't used them in practice and I haven't seen a jira around this topic for quite some time now. If we deprecate them and people complain that we are removing a useful feature then we can reconsider. Best, Stamatis On Tue, May 25, 2021, 8:43 PM Haisheng Yuan <hy...@apache.org> wrote: > Negative, currently. > > It is indeed complicated, but still acceptable IMO. If the community have > consensus on this matter, we can create a JIRA to add the feature to trait > and handle equivalent keys trait satisfaction in trait.satisfy() and column > remapping in trait.apply(), hope that can ease some pain, if you happen to > use Calcite's default distribution/collation implementation. > > Regards, > Haisheng Yuan > > On 2021/05/25 17:45:32, Vladimir Ozerov <ppoze...@gmail.com> wrote: > > Hi Haisheng, > > > > Thank you for the advice. This is exactly how I designed distribution at > > the moment (the approach 2 from my original email) - as a List<int[]> > > instead of just int[]. My main concern was the increased complexity of > the > > trait propagation/derivation, as I have to manage these nested lists by > > hand. Nevertheless, it works well. So I hoped that there are better > > built-in approaches that I may use. If the answer is negative, I'll > > continue using the original approach, when multiple alternatives managed > > manually. > > > > Regards, > > Vladimir. > > > > вт, 25 мая 2021 г. в 20:30, Haisheng Yuan <hy...@apache.org>: > > > > > Hi Vladimir, > > > > > > Glad to see you raised the question. > > > > > > Here is the advice: > > > Do not use RelMultipleTrait/RelCompositeTrait, which is fundamentally > > > flawed and has many bugs. It can't work properly no matter for > top-down or > > > bottom-up. > > > > > > Instead, we need to add equivalent keys bitmap as the property of > physical > > > trait like RelCollation, RelDistribution. > > > > > > For example: > > > class RelDistributionImpl { > > > // list of distribution keys > > > private ImmutableIntList keys; > > > > > > // list of equivalent bitset for each distribution key > > > private ImmutableList<ImmutableBitSet> equivBitSets; > > > } > > > > > > In the trait satisfy and column remapping, we also need to take > equivalent > > > keys into consideration. Some of the work need to be done in Calcite > core > > > framework. > > > > > > Greenplum Orca optimizer has similar strategy: > > > > > > > https://github.com/greenplum-db/gporca/blob/master/libgpopt/include/gpopt/base/CDistributionSpecHashed.h#L44 > > > > > > Regards, > > > Haisheng Yuan > > > > > > On 2021/05/25 15:37:32, Vladimir Ozerov <ppoze...@gmail.com> wrote: > > > > Hi, > > > > > > > > Consider the distributed SQL engine that uses a distribution > property to > > > > model exchanges. Consider the following physical tree. To do the > > > > distributed join, we co-locate tuples using the equijoin key. Now the > > > Join > > > > operator has two equivalent distributions - [a1] and [b1]. It is > critical > > > > to expose both distributions so that the top Aggregate can take > advantage > > > > of the co-location. > > > > > > > > Aggregate[group=b1] > > > > DistributedJoin[a.a1=b.b1] // SHARDED[a1], SHARDED[b1] > > > > Input[a] // SHARDED[a1] > > > > Input[b] // SHARDED[b1] > > > > > > > > A similar example for the Project: > > > > Aggregate[group=$1] > > > > Project[$0=a, $1=a] // SHARDED[$0], SHARDED[$1] > > > > Input // SHARDED[a] > > > > > > > > The question is how to model this situation properly? > > > > > > > > First, it seems that RelMultipleTrait and RelCompositeTrait were > designed > > > > to handle this situation. However, I couldn't make them work with the > > > > top-down optimizer. The reason is that when we register a RelNode > with a > > > > composite trait in MEMO, VolcanoPlanner flattens the composite trait > into > > > > the default trait value in RelSet.add -> RelTraitSet.simplify. That > is, > > > the > > > > trait [SHARDED[a], SHARDED[b]] will be converted to [ANY] so that the > > > > original traits could not be derived in the PhysicalNode.derive > methods. > > > > > > > > Second, we may try to model multiple sharding keys in a single > trait. But > > > > this complicates the implementation of > PhysicalNode.passThrough/derive > > > > significantly. > > > > SHARDED[a1, a2], SHARDED[b1, b2] -> SHARDED[[a1, a2], [b1, b2]] > > > > > > > > Third, we may expose multiple traits using metadata. > RelMdDistribution > > > > would not work, because it exposes only a single distribution. But a > > > custom > > > > handler may potentially fix that. However, it will not be integrated > with > > > > the top-down optimizer still, which makes the idea questionable. > > > > > > > > To summarize, it seems that currently there is no easy way to handle > > > > composite traits with a top-down optimizer. I wonder whether someone > from > > > > the devlist already solved similar issues in Apache Calcite or other > > > > optimizers. If so, what was the approach or best practices? > Intuitively, > > > it > > > > seems that RelMultipleTrait/RelCompositeTrait approach might be the > way > > > to > > > > go. But why do we replace the original composite trait set with the > > > default > > > > value in the RelTraitSet.simplify routine? > > > > > > > > Regards, > > > > Vladimir. > > > > > > > > > >