To clarify. The purpose of this API would be to give the search engine
more high-level as to the goals it should focus on. The performance
issues described in https://issues.apache.org/jira/browse/CALCITE-2970
seem to be due to the planner "trying everything", and the solution
might be to add a bit more high-level structure.

On Fri, Oct 18, 2019 at 11:07 AM Julian Hyde <jh...@apache.org> wrote:
>
> Excellent, very important discussion. This has been a major missing
> feature for a long time. Let's be sure to get to a conclusion and
> implement something.
>
> From the Volcano paper:
>
>   "the search engine permits the optimizer implementor to specify
>   a number of physical property vectors to be tried"
>
> How would we achieve this? Would we add an API to RelOptRule? If so,
> what would that API look like?
>
> On Thu, Oct 17, 2019 at 11:54 PM Stamatis Zampetakis <zabe...@gmail.com> 
> wrote:
> >
> > Hi Haisheng,
> >
> > This is an interesting topic but somehow in my mind I thought that this
> > mechanism is already in place.
> >
> > When an operator (logical or physical) is created its traitset is
> > determined in bottom-up fashion using the create
> > static factory method present in almost all operators. In my mind this is
> > in some sense the applicability function
> > mentioned in [1].
> >
> > Now during optimization we proceed in top-down manner and we request
> > certain traitsets from the operators.
> > If it happens and they contain already the requested traits nothing needs
> > to be done.
> >
> > In your example when we are about to create the sort-merge join we can
> > check what traitsets are present in the inputs
> > and if possible request those. Can you elaborate a bit more why do we need
> > a new type of metadata?
> >
> > Anyway if we cannot do it at the moment it makes sense to complete the
> > missing bits since what you are describing
> > was already mentioned in the original design of the Volcano optimizer [1].
> >
> > "If a move to be pursued is the exploration of a normal query processing
> > algorithm such as merge-join, its cost is calculated by the algorithm's
> > cost function. The algorithm's applicability function determines the
> > physical properly vectors for the algorithms inputs, and their costs and
> > optimal plans are found by invoking FindBestPlan for the inputs. For some
> > binary operators, the actual physical properties of the inputs are not as
> > important as the consistency of physical properties among the inputs. For
> > example, for a sort-based implementation of intersection, i.e., an
> > algorithm very similar to merge-join, any sort order of the two inputs will
> > suffice as long as the two inputs are sorted in the same way. Similarly,
> > for a parallel join, any partitioning of join inputs across multiple
> > processing nodes is acceptable if both inputs are partitioned using
> > Compatible partitioning rules. For these cases, the search engine permits
> > the optimizer implementor to specify a number of physical property vectors
> > to be tried. For example, for the intersection of two inputs R and S with
> > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> > (B,A,C), both these sort orders can be specified by the optimizer
> > implementor and will be optimized by the generated optimizer, while other
> > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> >
> > Best,
> > Stamatis
> >
> > [1]
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> >
> > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h.y...@alibaba-inc.com>
> > wrote:
> >
> > > TL;DR
> > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > derivation have their strongth and weakness, we propose
> > > on-demand TraitSet request to combine the above two, to reduce
> > > the number of plan alternatives that are genereated, especially
> > > in distributed system.
> > >
> > > e.g.
> > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > >
> > > In non-distributed system, we can generate a sort merge join,
> > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > chance of making use of the delivered ordering of foo. Because
> > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > sort on foo anymore. There are so many choices, n!, not even
> > > considering asc/desc and null direction. We can't request all
> > > the possible traitsets in top-down way, and can't derive all the
> > > possible traitsets in bottom-up way either.
> > >
> > > We propose on-demand traitset request by adding a new type
> > > of metadata DerivedTraitSets into the built-in metadata system.
> > >
> > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > >
> > > In this metadata, every operator returns several possbile traitsets
> > > that may be derived from this operator.
> > >
> > > Using above query as an example, the tablescan on foo should
> > > return traiset with collation on f3, f2, f1.
> > >
> > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > it gets possible traitsets from both child operators, uses the join
> > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > and requests corresponding traitset on the other child.
> > >
> > > This relies on the feature of AbstractConverter, which is turned
> > > off by default, due to performance issue [1].
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > >
> > > Haisheng
> > >
> > >

Reply via email to