Excellent, very important discussion. This has been a major missing
feature for a long time. Let's be sure to get to a conclusion and
implement something.

>From the Volcano paper:

  "the search engine permits the optimizer implementor to specify
  a number of physical property vectors to be tried"

How would we achieve this? Would we add an API to RelOptRule? If so,
what would that API look like?

On Thu, Oct 17, 2019 at 11:54 PM Stamatis Zampetakis <zabe...@gmail.com> wrote:
>
> Hi Haisheng,
>
> This is an interesting topic but somehow in my mind I thought that this
> mechanism is already in place.
>
> When an operator (logical or physical) is created its traitset is
> determined in bottom-up fashion using the create
> static factory method present in almost all operators. In my mind this is
> in some sense the applicability function
> mentioned in [1].
>
> Now during optimization we proceed in top-down manner and we request
> certain traitsets from the operators.
> If it happens and they contain already the requested traits nothing needs
> to be done.
>
> In your example when we are about to create the sort-merge join we can
> check what traitsets are present in the inputs
> and if possible request those. Can you elaborate a bit more why do we need
> a new type of metadata?
>
> Anyway if we cannot do it at the moment it makes sense to complete the
> missing bits since what you are describing
> was already mentioned in the original design of the Volcano optimizer [1].
>
> "If a move to be pursued is the exploration of a normal query processing
> algorithm such as merge-join, its cost is calculated by the algorithm's
> cost function. The algorithm's applicability function determines the
> physical properly vectors for the algorithms inputs, and their costs and
> optimal plans are found by invoking FindBestPlan for the inputs. For some
> binary operators, the actual physical properties of the inputs are not as
> important as the consistency of physical properties among the inputs. For
> example, for a sort-based implementation of intersection, i.e., an
> algorithm very similar to merge-join, any sort order of the two inputs will
> suffice as long as the two inputs are sorted in the same way. Similarly,
> for a parallel join, any partitioning of join inputs across multiple
> processing nodes is acceptable if both inputs are partitioned using
> Compatible partitioning rules. For these cases, the search engine permits
> the optimizer implementor to specify a number of physical property vectors
> to be tried. For example, for the intersection of two inputs R and S with
> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> (B,A,C), both these sort orders can be specified by the optimizer
> implementor and will be optimized by the generated optimizer, while other
> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
>
> Best,
> Stamatis
>
> [1]
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
>
> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h.y...@alibaba-inc.com>
> wrote:
>
> > TL;DR
> > Both top-down physical TraitSet request and bottom-up TraitSet
> > derivation have their strongth and weakness, we propose
> > on-demand TraitSet request to combine the above two, to reduce
> > the number of plan alternatives that are genereated, especially
> > in distributed system.
> >
> > e.g.
> > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> >
> > In non-distributed system, we can generate a sort merge join,
> > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > chance of making use of the delivered ordering of foo. Because
> > if we require bar to be sorted by b3,b2,b1, we don't need to
> > sort on foo anymore. There are so many choices, n!, not even
> > considering asc/desc and null direction. We can't request all
> > the possible traitsets in top-down way, and can't derive all the
> > possible traitsets in bottom-up way either.
> >
> > We propose on-demand traitset request by adding a new type
> > of metadata DerivedTraitSets into the built-in metadata system.
> >
> > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> >
> > In this metadata, every operator returns several possbile traitsets
> > that may be derived from this operator.
> >
> > Using above query as an example, the tablescan on foo should
> > return traiset with collation on f3, f2, f1.
> >
> > In physical implementation rules, e.g. the SortMergeJoinRule,
> > it gets possible traitsets from both child operators, uses the join
> > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > and requests corresponding traitset on the other child.
> >
> > This relies on the feature of AbstractConverter, which is turned
> > off by default, due to performance issue [1].
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >
> > Haisheng
> >
> >

Reply via email to