Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov Tue, 10 Dec 2019 08:51:43 -0800

Hi Vladimir,

from what I understand, Drill does not exploit collation of indexes. To
be precise it does not exploit index collation in "natural" way where,
say, we a have sorted TableScan and hence we do not create a new Sort.
Instead of it Drill always create a Sort operator, but if TableScan can
be replaced with an IndexScan, this Sort operator is removed by the
dedicated rule.


Lets consider initial an operator tree:

Project
  Sort
    TableScan

after applying rule DbScanToIndexScanPrule this tree will be converted to:

Project
  Sort
    IndexScan

and finally, after applying DbScanSortRemovalRule we have:

Project
  IndexScan

while for Phoenix approach we would have two equivalent subsets in our
planner:

Project
  Sort
    TableScan

and

Project
  IndexScan

and most likely the last plan  will be chosen as the best one.

-- 
Kind Regards
Roman Kondakov


On 10.12.2019 17:19, Vladimir Ozerov wrote:
> Hi Roman,
> 
> Why do you think that Drill-style will not let you exploit collation?
> Collation should be propagated from the index scan in the same way as in
> other sorted operators, such as merge join or streaming aggregate. Provided
> that you use converter-hack (or any alternative solution to trigger parent
> re-analysis).
> In other words, propagation of collation from Drill-style indexes should be
> no different from other sorted operators.
> 
> Regards,
> Vladimir.
> 
> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <arzamas...@mail.ru.invalid
>> :
> 
>>
>> Roman just as fast remark, Phoenix builds their approach on
>> already existing monolith HBase architecture, most cases it`s just a stub
>> for someone who wants use secondary indexes with a base with no
>> native support of it. Don`t think it`s good idea here.
>>
>>>
>>>
>>> ------- Forwarded message -------
>>> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
>>> To:  dev@ignite.apache.org
>>> Cc:
>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>> planner
>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>
>>> Hi all!
>>>
>>> As you may know there is an activity on integration of Apache Calcite
>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>
>>> One of a bunch of problems in this integration is the absence of
>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>> some research I came to conclusion that this problem has a couple of
>>> workarounds. Let's name them
>>> 1. Phoenix-style approach - representing secondary indexes as
>>> materialized views which are natively supported by Calcite engine [3]
>>> 2. Drill-style approach - pushing filters into the table scans and
>>> choose appropriate index for lookups when possible [4]
>>>
>>> Both these approaches have advantages and disadvantages:
>>>
>>> Phoenix style pros:
>>> - natural way of adding indexes as an alternative source of rows: index
>>> can be considered as a kind of sorted materialized view.
>>> - possibility of using index sortedness for stream aggregates,
>>> deduplication (DISTINCT operator), merge joins, etc.
>>> - ability to support other types of indexes (i.e. functional indexes).
>>>
>>> Phoenix style cons:
>>> - polluting optimizer's search space extra table scans hence increasing
>>> the planning time.
>>>
>>> Drill style pros:
>>> - easier to implement (although it's questionable).
>>> - search space is not inflated.
>>>
>>> Drill style cons:
>>> - missed opportunity to exploit sortedness.
>>>
>>> There is a good discussion about using both approaches can be found in
>> [5].
>>>
>>> I made a small sketch [6] in order to demonstrate the applicability of
>>> the Phoenix approach to Ignite. Key design concepts are:
>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>> step is needed for internal Calcite's routines.
>>> 2. On planner initialization we register these indexes as materialized
>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>> method.
>>> 3. Right before the query execution Calcite selects all materialized
>>> views (indexes) which can be potentially used in query.
>>> 4. During the query optimization indexes are registered by planner as
>>> usual TableScans and hence can be chosen by optimizer if they have lower
>>> cost.
>>>
>>> This sketch shows the ability to exploit index sortedness only. So the
>>> future work in this direction should be focused on using indexes for
>>> fast index lookups. At first glance FilterableTable and
>>> FilterTableScanRule are good points to start. We can push Filter into
>>> the TableScan and then use FilterableTable for fast index lookups
>>> avoiding reading the whole index on TableScan step and then filtering
>>> its output on the Filter step.
>>>
>>> What do you think?
>>>
>>>
>>>
>>> [1]
>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>> [2]
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>> [6]  https://github.com/apache/ignite/pull/7115
>>
>>
>>
>>
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to