Hi Vladimir, from what I understand, Drill does not exploit collation of indexes. To be precise it does not exploit index collation in "natural" way where, say, we a have sorted TableScan and hence we do not create a new Sort. Instead of it Drill always create a Sort operator, but if TableScan can be replaced with an IndexScan, this Sort operator is removed by the dedicated rule.
Lets consider initial an operator tree: Project Sort TableScan after applying rule DbScanToIndexScanPrule this tree will be converted to: Project Sort IndexScan and finally, after applying DbScanSortRemovalRule we have: Project IndexScan while for Phoenix approach we would have two equivalent subsets in our planner: Project Sort TableScan and Project IndexScan and most likely the last plan will be chosen as the best one. -- Kind Regards Roman Kondakov On 10.12.2019 17:19, Vladimir Ozerov wrote: > Hi Roman, > > Why do you think that Drill-style will not let you exploit collation? > Collation should be propagated from the index scan in the same way as in > other sorted operators, such as merge join or streaming aggregate. Provided > that you use converter-hack (or any alternative solution to trigger parent > re-analysis). > In other words, propagation of collation from Drill-style indexes should be > no different from other sorted operators. > > Regards, > Vladimir. > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <arzamas...@mail.ru.invalid >> : > >> >> Roman just as fast remark, Phoenix builds their approach on >> already existing monolith HBase architecture, most cases it`s just a stub >> for someone who wants use secondary indexes with a base with no >> native support of it. Don`t think it`s good idea here. >> >>> >>> >>> ------- Forwarded message ------- >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > >>> To: dev@ignite.apache.org >>> Cc: >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite >>> planner >>> Date: Tue, 10 Dec 2019 15:55:52 +0300 >>> >>> Hi all! >>> >>> As you may know there is an activity on integration of Apache Calcite >>> query optimizer into Ignite codebase is being carried out [1],[2]. >>> >>> One of a bunch of problems in this integration is the absence of >>> out-of-the-box support for secondary indexes in Apache Calcite. After >>> some research I came to conclusion that this problem has a couple of >>> workarounds. Let's name them >>> 1. Phoenix-style approach - representing secondary indexes as >>> materialized views which are natively supported by Calcite engine [3] >>> 2. Drill-style approach - pushing filters into the table scans and >>> choose appropriate index for lookups when possible [4] >>> >>> Both these approaches have advantages and disadvantages: >>> >>> Phoenix style pros: >>> - natural way of adding indexes as an alternative source of rows: index >>> can be considered as a kind of sorted materialized view. >>> - possibility of using index sortedness for stream aggregates, >>> deduplication (DISTINCT operator), merge joins, etc. >>> - ability to support other types of indexes (i.e. functional indexes). >>> >>> Phoenix style cons: >>> - polluting optimizer's search space extra table scans hence increasing >>> the planning time. >>> >>> Drill style pros: >>> - easier to implement (although it's questionable). >>> - search space is not inflated. >>> >>> Drill style cons: >>> - missed opportunity to exploit sortedness. >>> >>> There is a good discussion about using both approaches can be found in >> [5]. >>> >>> I made a small sketch [6] in order to demonstrate the applicability of >>> the Phoenix approach to Ignite. Key design concepts are: >>> 1. On creating indexes are registered as tables in Calcite schema. This >>> step is needed for internal Calcite's routines. >>> 2. On planner initialization we register these indexes as materialized >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization >>> method. >>> 3. Right before the query execution Calcite selects all materialized >>> views (indexes) which can be potentially used in query. >>> 4. During the query optimization indexes are registered by planner as >>> usual TableScans and hence can be chosen by optimizer if they have lower >>> cost. >>> >>> This sketch shows the ability to exploit index sortedness only. So the >>> future work in this direction should be focused on using indexes for >>> fast index lookups. At first glance FilterableTable and >>> FilterTableScanRule are good points to start. We can push Filter into >>> the TableScan and then use FilterableTable for fast index lookups >>> avoiding reading the whole index on TableScan step and then filtering >>> its output on the Filter step. >>> >>> What do you think? >>> >>> >>> >>> [1] >>> >> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none >>> [2] >>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine >>> [3] https://issues.apache.org/jira/browse/PHOENIX-2047 >>> [4] https://issues.apache.org/jira/browse/DRILL-6381 >>> [5] https://issues.apache.org/jira/browse/DRILL-3929 >>> [6] https://github.com/apache/ignite/pull/7115 >> >> >> >> >