Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Ivan Pavlukhin Wed, 11 Dec 2019 06:20:47 -0800

Vladimir,

You are right Phoenix integration with Calcite stalled halfway. See
[1] to get some reasons.


[1] 
https://lists.apache.org/thread.html/0152a97bfebb85c74f10e26e94ab9cd416dec374abba7dc2e1af9d61%40%3Cdev.phoenix.apache.org%3E

ср, 11 дек. 2019 г. в 17:11, Vladimir Ozerov <[email protected]>:
>
> Roman,
>
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
>
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[email protected]>:
>
> > Hi Vladimir,
> >
> > from what I understand, Drill does not exploit collation of indexes. To
> > be precise it does not exploit index collation in "natural" way where,
> > say, we a have sorted TableScan and hence we do not create a new Sort.
> > Instead of it Drill always create a Sort operator, but if TableScan can
> > be replaced with an IndexScan, this Sort operator is removed by the
> > dedicated rule.
> >
> > Lets consider initial an operator tree:
> >
> > Project
> >   Sort
> >     TableScan
> >
> > after applying rule DbScanToIndexScanPrule this tree will be converted to:
> >
> > Project
> >   Sort
> >     IndexScan
> >
> > and finally, after applying DbScanSortRemovalRule we have:
> >
> > Project
> >   IndexScan
> >
> > while for Phoenix approach we would have two equivalent subsets in our
> > planner:
> >
> > Project
> >   Sort
> >     TableScan
> >
> > and
> >
> > Project
> >   IndexScan
> >
> > and most likely the last plan  will be chosen as the best one.
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > > Hi Roman,
> > >
> > > Why do you think that Drill-style will not let you exploit collation?
> > > Collation should be propagated from the index scan in the same way as in
> > > other sorted operators, such as merge join or streaming aggregate.
> > Provided
> > > that you use converter-hack (or any alternative solution to trigger
> > parent
> > > re-analysis).
> > > In other words, propagation of collation from Drill-style indexes should
> > be
> > > no different from other sorted operators.
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> > <[email protected]
> > >> :
> > >
> > >>
> > >> Roman just as fast remark, Phoenix builds their approach on
> > >> already existing monolith HBase architecture, most cases it`s just a
> > stub
> > >> for someone who wants use secondary indexes with a base with no
> > >> native support of it. Don`t think it`s good idea here.
> > >>
> > >>>
> > >>>
> > >>> ------- Forwarded message -------
> > >>> From: "Roman Kondakov" < [email protected] >
> > >>> To:  [email protected]
> > >>> Cc:
> > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> > >>> planner
> > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> > >>>
> > >>> Hi all!
> > >>>
> > >>> As you may know there is an activity on integration of Apache Calcite
> > >>> query optimizer into Ignite codebase is being carried out [1],[2].
> > >>>
> > >>> One of a bunch of problems in this integration is the absence of
> > >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> > >>> some research I came to conclusion that this problem has a couple of
> > >>> workarounds. Let's name them
> > >>> 1. Phoenix-style approach - representing secondary indexes as
> > >>> materialized views which are natively supported by Calcite engine [3]
> > >>> 2. Drill-style approach - pushing filters into the table scans and
> > >>> choose appropriate index for lookups when possible [4]
> > >>>
> > >>> Both these approaches have advantages and disadvantages:
> > >>>
> > >>> Phoenix style pros:
> > >>> - natural way of adding indexes as an alternative source of rows: index
> > >>> can be considered as a kind of sorted materialized view.
> > >>> - possibility of using index sortedness for stream aggregates,
> > >>> deduplication (DISTINCT operator), merge joins, etc.
> > >>> - ability to support other types of indexes (i.e. functional indexes).
> > >>>
> > >>> Phoenix style cons:
> > >>> - polluting optimizer's search space extra table scans hence increasing
> > >>> the planning time.
> > >>>
> > >>> Drill style pros:
> > >>> - easier to implement (although it's questionable).
> > >>> - search space is not inflated.
> > >>>
> > >>> Drill style cons:
> > >>> - missed opportunity to exploit sortedness.
> > >>>
> > >>> There is a good discussion about using both approaches can be found in
> > >> [5].
> > >>>
> > >>> I made a small sketch [6] in order to demonstrate the applicability of
> > >>> the Phoenix approach to Ignite. Key design concepts are:
> > >>> 1. On creating indexes are registered as tables in Calcite schema. This
> > >>> step is needed for internal Calcite's routines.
> > >>> 2. On planner initialization we register these indexes as materialized
> > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> > >>> method.
> > >>> 3. Right before the query execution Calcite selects all materialized
> > >>> views (indexes) which can be potentially used in query.
> > >>> 4. During the query optimization indexes are registered by planner as
> > >>> usual TableScans and hence can be chosen by optimizer if they have
> > lower
> > >>> cost.
> > >>>
> > >>> This sketch shows the ability to exploit index sortedness only. So the
> > >>> future work in this direction should be focused on using indexes for
> > >>> fast index lookups. At first glance FilterableTable and
> > >>> FilterTableScanRule are good points to start. We can push Filter into
> > >>> the TableScan and then use FilterableTable for fast index lookups
> > >>> avoiding reading the whole index on TableScan step and then filtering
> > >>> its output on the Filter step.
> > >>>
> > >>> What do you think?
> > >>>
> > >>>
> > >>>
> > >>> [1]
> > >>>
> > >>
> > http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> > >>> [2]
> > >>>
> > >>
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> > >>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> > >>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
> > >>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
> > >>> [6]  https://github.com/apache/ignite/pull/7115
> > >>
> > >>
> > >>
> > >>
> > >
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to