Colleagues, As far as I understand, materialization acts like a special rule, that matches some subtree pattern (a leaf part of a query plan) to a star table, which may have better cost than the subtree, it replaces. Saying that, in general, there is no difference between approaches - they do the same almost in the same way but using different API.
My opinion is it’s better to do the deal using rules - it makes overall approach consistent. Regards, Igor > 12 дек. 2019 г., в 10:03, Vladimir Ozerov <ppoze...@gmail.com> написал(а): > > Roman, > > What I am trying to understand is what advantage of materialization API you > see over the normal optimization process? Does it save optimization time, > or reduce memory footprint, or maybe provide better plans? I am asking > because I do not see how expressing indexes as materializations fit > classical optimization process. We discussed Sort <- Scan optimization. > Let's consider another example: > > LogicalSort[a ASC] > LogicalJoin > > Initially, you do not know the implementation of the join, and hence do not > know it's collation. Then you may execute physical join rules, which > produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation > rule afterwards, you may easily eliminate the sort, or make it simpler > (e.g. remove local sorting phase), depending on the distribution. In other > words, proper implementation of sorting optimization assumes that you have > a kind of SortRemoveRule anyway, irrespectively of whether you use > materializations or not, because sorting may be injected on top of any > operator. With this in mind, the use of materializations doesn't make the > planner simpler. Neither it improves the outcome of the whole optimization > process. > > What is left is either lower CPU or RAM usage? Is this the case? > > ср, 11 дек. 2019 г. в 18:37, Roman Kondakov <kondako...@mail.ru.invalid>: > >> Vladimir, >> >> the main advantage of the Phoenix approach I can see is the using of >> Calcite's native materializations API. Calcite has advanced support for >> materializations [1] and lattices [2]. Since secondary indexes can be >> considered as materialized views (it's just a sorted representation of >> the same table) we can seamlessly use views to simulate indexes behavior >> for Calcite planner. >> >> >> [1] https://calcite.apache.org/docs/materialized_views.html >> [2] https://calcite.apache.org/docs/lattice.html >> >> -- >> Kind Regards >> Roman Kondakov >> >> >> On 11.12.2019 17:11, Vladimir Ozerov wrote: >>> Roman, >>> >>> What is the advantage of Phoenix approach then? BTW, it looks like >> Phoenix >>> integration with Calcite never made it to production, did it? >>> >>> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <kondako...@mail.ru.invalid >>> : >>> >>>> Hi Vladimir, >>>> >>>> from what I understand, Drill does not exploit collation of indexes. To >>>> be precise it does not exploit index collation in "natural" way where, >>>> say, we a have sorted TableScan and hence we do not create a new Sort. >>>> Instead of it Drill always create a Sort operator, but if TableScan can >>>> be replaced with an IndexScan, this Sort operator is removed by the >>>> dedicated rule. >>>> >>>> Lets consider initial an operator tree: >>>> >>>> Project >>>> Sort >>>> TableScan >>>> >>>> after applying rule DbScanToIndexScanPrule this tree will be converted >> to: >>>> >>>> Project >>>> Sort >>>> IndexScan >>>> >>>> and finally, after applying DbScanSortRemovalRule we have: >>>> >>>> Project >>>> IndexScan >>>> >>>> while for Phoenix approach we would have two equivalent subsets in our >>>> planner: >>>> >>>> Project >>>> Sort >>>> TableScan >>>> >>>> and >>>> >>>> Project >>>> IndexScan >>>> >>>> and most likely the last plan will be chosen as the best one. >>>> >>>> -- >>>> Kind Regards >>>> Roman Kondakov >>>> >>>> >>>> On 10.12.2019 17:19, Vladimir Ozerov wrote: >>>>> Hi Roman, >>>>> >>>>> Why do you think that Drill-style will not let you exploit collation? >>>>> Collation should be propagated from the index scan in the same way as >> in >>>>> other sorted operators, such as merge join or streaming aggregate. >>>> Provided >>>>> that you use converter-hack (or any alternative solution to trigger >>>> parent >>>>> re-analysis). >>>>> In other words, propagation of collation from Drill-style indexes >> should >>>> be >>>>> no different from other sorted operators. >>>>> >>>>> Regards, >>>>> Vladimir. >>>>> >>>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky >>>> <arzamas...@mail.ru.invalid >>>>>> : >>>>> >>>>>> >>>>>> Roman just as fast remark, Phoenix builds their approach on >>>>>> already existing monolith HBase architecture, most cases it`s just a >>>> stub >>>>>> for someone who wants use secondary indexes with a base with no >>>>>> native support of it. Don`t think it`s good idea here. >>>>>> >>>>>>> >>>>>>> >>>>>>> ------- Forwarded message ------- >>>>>>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > >>>>>>> To: dev@ignite.apache.org >>>>>>> Cc: >>>>>>> Subject: Adding support for Ignite secondary indexes to Apache >> Calcite >>>>>>> planner >>>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300 >>>>>>> >>>>>>> Hi all! >>>>>>> >>>>>>> As you may know there is an activity on integration of Apache Calcite >>>>>>> query optimizer into Ignite codebase is being carried out [1],[2]. >>>>>>> >>>>>>> One of a bunch of problems in this integration is the absence of >>>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After >>>>>>> some research I came to conclusion that this problem has a couple of >>>>>>> workarounds. Let's name them >>>>>>> 1. Phoenix-style approach - representing secondary indexes as >>>>>>> materialized views which are natively supported by Calcite engine [3] >>>>>>> 2. Drill-style approach - pushing filters into the table scans and >>>>>>> choose appropriate index for lookups when possible [4] >>>>>>> >>>>>>> Both these approaches have advantages and disadvantages: >>>>>>> >>>>>>> Phoenix style pros: >>>>>>> - natural way of adding indexes as an alternative source of rows: >> index >>>>>>> can be considered as a kind of sorted materialized view. >>>>>>> - possibility of using index sortedness for stream aggregates, >>>>>>> deduplication (DISTINCT operator), merge joins, etc. >>>>>>> - ability to support other types of indexes (i.e. functional >> indexes). >>>>>>> >>>>>>> Phoenix style cons: >>>>>>> - polluting optimizer's search space extra table scans hence >> increasing >>>>>>> the planning time. >>>>>>> >>>>>>> Drill style pros: >>>>>>> - easier to implement (although it's questionable). >>>>>>> - search space is not inflated. >>>>>>> >>>>>>> Drill style cons: >>>>>>> - missed opportunity to exploit sortedness. >>>>>>> >>>>>>> There is a good discussion about using both approaches can be found >> in >>>>>> [5]. >>>>>>> >>>>>>> I made a small sketch [6] in order to demonstrate the applicability >> of >>>>>>> the Phoenix approach to Ignite. Key design concepts are: >>>>>>> 1. On creating indexes are registered as tables in Calcite schema. >> This >>>>>>> step is needed for internal Calcite's routines. >>>>>>> 2. On planner initialization we register these indexes as >> materialized >>>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization >>>>>>> method. >>>>>>> 3. Right before the query execution Calcite selects all materialized >>>>>>> views (indexes) which can be potentially used in query. >>>>>>> 4. During the query optimization indexes are registered by planner as >>>>>>> usual TableScans and hence can be chosen by optimizer if they have >>>> lower >>>>>>> cost. >>>>>>> >>>>>>> This sketch shows the ability to exploit index sortedness only. So >> the >>>>>>> future work in this direction should be focused on using indexes for >>>>>>> fast index lookups. At first glance FilterableTable and >>>>>>> FilterTableScanRule are good points to start. We can push Filter into >>>>>>> the TableScan and then use FilterableTable for fast index lookups >>>>>>> avoiding reading the whole index on TableScan step and then filtering >>>>>>> its output on the Filter step. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>>>>> >>>> >> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none >>>>>>> [2] >>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine >>>>>>> [3] https://issues.apache.org/jira/browse/PHOENIX-2047 >>>>>>> [4] https://issues.apache.org/jira/browse/DRILL-6381 >>>>>>> [5] https://issues.apache.org/jira/browse/DRILL-3929 >>>>>>> [6] https://github.com/apache/ignite/pull/7115 >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>