Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Seliverstov Igor Fri, 13 Dec 2019 01:39:11 -0800

Colleagues,

As far as I understand, materialization acts like a special rule, that matches 
some subtree pattern (a leaf part of a query plan) to a star table, which may 
have better cost than the subtree, it replaces. Saying that, in general, there 
is no difference between approaches - they do the same almost in the same way 
but using different API.


My opinion is it’s better to do the deal using rules - it makes overall 
approach consistent.

Regards,
Igor

> 12 дек. 2019 г., в 10:03, Vladimir Ozerov <[email protected]> написал(а):
> 
> Roman,
> 
> What I am trying to understand is what advantage of materialization API you
> see over the normal optimization process? Does it save optimization time,
> or reduce memory footprint, or maybe provide better plans? I am asking
> because I do not see how expressing indexes as materializations fit
> classical optimization process. We discussed Sort <- Scan optimization.
> Let's consider another example:
> 
> LogicalSort[a ASC]
>  LogicalJoin
> 
> Initially, you do not know the implementation of the join, and hence do not
> know it's collation. Then you may execute physical join rules, which
> produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation
> rule afterwards, you may easily eliminate the sort, or make it simpler
> (e.g. remove local sorting phase), depending on the distribution. In other
> words, proper implementation of sorting optimization assumes that you have
> a kind of SortRemoveRule anyway, irrespectively of whether you use
> materializations or not, because sorting may be injected on top of any
> operator. With this in mind, the use of materializations doesn't make the
> planner simpler. Neither it improves the outcome of the whole optimization
> process.
> 
> What is left is either lower CPU or RAM usage? Is this the case?
> 
> ср, 11 дек. 2019 г. в 18:37, Roman Kondakov <[email protected]>:
> 
>> Vladimir,
>> 
>> the main advantage of the Phoenix approach I can see is the using of
>> Calcite's native materializations API. Calcite has advanced support for
>> materializations [1] and lattices [2]. Since secondary indexes can be
>> considered as materialized views (it's just a sorted representation of
>> the same table) we can seamlessly use views to simulate indexes behavior
>> for Calcite planner.
>> 
>> 
>> [1] https://calcite.apache.org/docs/materialized_views.html
>> [2] https://calcite.apache.org/docs/lattice.html
>> 
>> --
>> Kind Regards
>> Roman Kondakov
>> 
>> 
>> On 11.12.2019 17:11, Vladimir Ozerov wrote:
>>> Roman,
>>> 
>>> What is the advantage of Phoenix approach then? BTW, it looks like
>> Phoenix
>>> integration with Calcite never made it to production, did it?
>>> 
>>> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[email protected]
>>> :
>>> 
>>>> Hi Vladimir,
>>>> 
>>>> from what I understand, Drill does not exploit collation of indexes. To
>>>> be precise it does not exploit index collation in "natural" way where,
>>>> say, we a have sorted TableScan and hence we do not create a new Sort.
>>>> Instead of it Drill always create a Sort operator, but if TableScan can
>>>> be replaced with an IndexScan, this Sort operator is removed by the
>>>> dedicated rule.
>>>> 
>>>> Lets consider initial an operator tree:
>>>> 
>>>> Project
>>>>  Sort
>>>>    TableScan
>>>> 
>>>> after applying rule DbScanToIndexScanPrule this tree will be converted
>> to:
>>>> 
>>>> Project
>>>>  Sort
>>>>    IndexScan
>>>> 
>>>> and finally, after applying DbScanSortRemovalRule we have:
>>>> 
>>>> Project
>>>>  IndexScan
>>>> 
>>>> while for Phoenix approach we would have two equivalent subsets in our
>>>> planner:
>>>> 
>>>> Project
>>>>  Sort
>>>>    TableScan
>>>> 
>>>> and
>>>> 
>>>> Project
>>>>  IndexScan
>>>> 
>>>> and most likely the last plan  will be chosen as the best one.
>>>> 
>>>> --
>>>> Kind Regards
>>>> Roman Kondakov
>>>> 
>>>> 
>>>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>>>> Hi Roman,
>>>>> 
>>>>> Why do you think that Drill-style will not let you exploit collation?
>>>>> Collation should be propagated from the index scan in the same way as
>> in
>>>>> other sorted operators, such as merge join or streaming aggregate.
>>>> Provided
>>>>> that you use converter-hack (or any alternative solution to trigger
>>>> parent
>>>>> re-analysis).
>>>>> In other words, propagation of collation from Drill-style indexes
>> should
>>>> be
>>>>> no different from other sorted operators.
>>>>> 
>>>>> Regards,
>>>>> Vladimir.
>>>>> 
>>>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>>>> <[email protected]
>>>>>> :
>>>>> 
>>>>>> 
>>>>>> Roman just as fast remark, Phoenix builds their approach on
>>>>>> already existing monolith HBase architecture, most cases it`s just a
>>>> stub
>>>>>> for someone who wants use secondary indexes with a base with no
>>>>>> native support of it. Don`t think it`s good idea here.
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ------- Forwarded message -------
>>>>>>> From: "Roman Kondakov" < [email protected] >
>>>>>>> To:  [email protected]
>>>>>>> Cc:
>>>>>>> Subject: Adding support for Ignite secondary indexes to Apache
>> Calcite
>>>>>>> planner
>>>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>>>> 
>>>>>>> Hi all!
>>>>>>> 
>>>>>>> As you may know there is an activity on integration of Apache Calcite
>>>>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>>>> 
>>>>>>> One of a bunch of problems in this integration is the absence of
>>>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>>>>> some research I came to conclusion that this problem has a couple of
>>>>>>> workarounds. Let's name them
>>>>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>>>>> materialized views which are natively supported by Calcite engine [3]
>>>>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>>>>> choose appropriate index for lookups when possible [4]
>>>>>>> 
>>>>>>> Both these approaches have advantages and disadvantages:
>>>>>>> 
>>>>>>> Phoenix style pros:
>>>>>>> - natural way of adding indexes as an alternative source of rows:
>> index
>>>>>>> can be considered as a kind of sorted materialized view.
>>>>>>> - possibility of using index sortedness for stream aggregates,
>>>>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>>>>> - ability to support other types of indexes (i.e. functional
>> indexes).
>>>>>>> 
>>>>>>> Phoenix style cons:
>>>>>>> - polluting optimizer's search space extra table scans hence
>> increasing
>>>>>>> the planning time.
>>>>>>> 
>>>>>>> Drill style pros:
>>>>>>> - easier to implement (although it's questionable).
>>>>>>> - search space is not inflated.
>>>>>>> 
>>>>>>> Drill style cons:
>>>>>>> - missed opportunity to exploit sortedness.
>>>>>>> 
>>>>>>> There is a good discussion about using both approaches can be found
>> in
>>>>>> [5].
>>>>>>> 
>>>>>>> I made a small sketch [6] in order to demonstrate the applicability
>> of
>>>>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>>>>> 1. On creating indexes are registered as tables in Calcite schema.
>> This
>>>>>>> step is needed for internal Calcite's routines.
>>>>>>> 2. On planner initialization we register these indexes as
>> materialized
>>>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>>>>> method.
>>>>>>> 3. Right before the query execution Calcite selects all materialized
>>>>>>> views (indexes) which can be potentially used in query.
>>>>>>> 4. During the query optimization indexes are registered by planner as
>>>>>>> usual TableScans and hence can be chosen by optimizer if they have
>>>> lower
>>>>>>> cost.
>>>>>>> 
>>>>>>> This sketch shows the ability to exploit index sortedness only. So
>> the
>>>>>>> future work in this direction should be focused on using indexes for
>>>>>>> fast index lookups. At first glance FilterableTable and
>>>>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>>>>> the TableScan and then use FilterableTable for fast index lookups
>>>>>>> avoiding reading the whole index on TableScan step and then filtering
>>>>>>> its output on the Filter step.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>> 
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>>>>> [2]
>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to