The 1 approach is what actually Apache Flink did and works great. Vladimir Ozerov <ppoze...@gmail.com>于2019年11月20日 周三上午2:22写道:
> Hi Stamatis, > > This approach worth trying, but I am afraid that it may produce too many > permutations. For example, this is the logical plan of one of TPC-H > benchmarks, and I am not very keen expanding it much further :-) > > 901:ProjectLogicalRel(ps_partkey=[$0], val=[$1]) > 899:FilterLogicalRel(subset=[rel#900:Subset#66.LOGICAL.ANY.[]], > condition=[>($1, $2)]) > 897:JoinLogicalRel(subset=[rel#898:Subset#65.LOGICAL.ANY.[]], > condition=[true], joinType=[left]) > 882:AggregateLogicalRel(subset=[rel#883:Subset#57.LOGICAL.ANY.[]], > group=[{0}], val=[SUM($1)]) > 880:ProjectLogicalRel(subset=[rel#881:Subset#56.LOGICAL.ANY.[]], > ps_partkey=[$1], $f1=[*($2, $3)]) > 878:JoinLogicalRel(subset=[rel#879:Subset#55.LOGICAL.ANY.[]], > condition=[=($5, $6)], joinType=[inner]) > 875:JoinLogicalRel(subset=[rel#876:Subset#53.LOGICAL.ANY.[]], > condition=[=($0, $4)], joinType=[inner]) > > 634:MapScanLogicalRel(subset=[rel#873:Subset#51.LOGICAL.ANY.[]], > table=[[partsupp]], projects=[[0, 1, 2, 3]]) > > 632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]], > table=[[supplier]], projects=[[0, 1]]) > > 630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]], > table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')]) > 895:AggregateLogicalRel(subset=[rel#896:Subset#64.LOGICAL.ANY.[]], > group=[{}], agg#0=[SINGLE_VALUE($0)]) > 893:ProjectLogicalRel(subset=[rel#894:Subset#63.LOGICAL.ANY.[]], > EXPR$0=[*($0, 0.9:DECIMAL(2, 1))]) > > 891:AggregateLogicalRel(subset=[rel#892:Subset#62.LOGICAL.ANY.[]], > group=[{}], agg#0=[SUM($0)]) > > 889:ProjectLogicalRel(subset=[rel#890:Subset#61.LOGICAL.ANY.[]], $f0=[*($1, > $2)]) > 887:JoinLogicalRel(subset=[rel#888:Subset#60.LOGICAL.ANY.[]], > condition=[=($4, $5)], joinType=[inner]) > > 885:JoinLogicalRel(subset=[rel#886:Subset#59.LOGICAL.ANY.[]], > condition=[=($0, $3)], joinType=[inner]) > > 680:MapScanLogicalRel(subset=[rel#884:Subset#58.LOGICAL.ANY.[]], > table=[[partsupp]], projects=[[0, 2, 3]]) > > 632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]], > table=[[supplier]], projects=[[0, 1]]) > > 630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]], > table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')]) > > I have two working solutions at the moment, which do not require > cross-convention converters: > 1) Create separate rules for logical-physical node conversion, and for > physical-physical optimization > 2) A kind of hack: add fake AbstractConverter with logical traits after the > physical node is created > > I hope one of them will work good enough for the production case. Most > likely it would be p.1. > > Regards, > Vladimir. > > > пт, 15 нояб. 2019 г. в 11:04, Stamatis Zampetakis <zabe...@gmail.com>: > > > Hi again Vladimir, > > > > In my previous email, I mentioned three rules for performing logical to > > physical conversions. > > It's normal that if you add more operators we will end up with more > rules. > > Now that the example has a filter we have the following rule list: > > > > Rule 1: RootLogicalRel -> RootPhysicalRel > > Rule 2: ProjectLogicalRel -> ProjectPhysicalRel > > Rule 3: FilterLogicalRel -> FilterPhysicalRel > > Rule 4: MapScanLogicalRel -> MapScanPhysicalRel > > > > If I understand well, your concern is that in order to get the optimal > plan > > you would have to introduce many additional rules for the Exchange > operator > > (e.g., ExchangeProjectTransposeRule, ExchangeFilterTransposeRue, etc.). > > I was thinking that it is not necessary to introduce these kind of rules. > > You could have only one rule, i.e., PhysicaExchangeRule, for enforcing a > > distribution that would be applied when there is a requirement for > > particular distribution. > > > > So for the initial plan > > LogicalRoot [distribution=SINGLETON] > > -> LogicalProject > > -> LogicalFilter > > -> LogicalScan > > > > The ruleset above should generate the following alternatives where the > > PhysicalExchangeRule is applied 3 times. > > > > Alt1: > > PhysicalRoot > > -> SingletonExchange > > -> PhysicalProject > > -> PhysicalFilter > > -> PhysicalScan > > > > Alt2: > > PhysicalRoot > > -> PhysicalProject > > -> SingletonExchange > > -> PhysicalFilter > > -> PhysicalScan > > > > Alt3: > > PhysicalRoot > > -> PhysicalProject > > -> PhysicalFilter > > -> SingletonExchange > > -> PhysicalScan > > > > This is still in the same spirit of the previous example where the > physical > > property (distribution) is either satisfied immediately or passed down to > > the next level. > > > > Best, > > Stamatis > > > > On Thu, Nov 14, 2019 at 11:14 AM Vladimir Ozerov <ppoze...@gmail.com> > > wrote: > > > > > search place* = search space > > > > > > чт, 14 нояб. 2019 г. в 13:10, Vladimir Ozerov <ppoze...@gmail.com>: > > > > > > > Hi Haisheng, > > > > > > > > I double-checked the code. My original version returned false for > some > > > > cases, but it didn't affect number of rules calls anyway, so I > changed > > it > > > > to always return true. Please note that if I change the code as you > > > > suggested, the test started failing, because bottom-up propagation of > > > rule > > > > calls no longer work: when the child is converted to physical form, > the > > > > parent logical node is not notified. This is the very problem I > address > > > > with that weird physical-to-logical conversions: they do not make > > sense, > > > > and converter expansion does not produce any new rels, but their > > > existence > > > > allow for logical rule re-trigger which ultimately allow the plan to > > > > compile. > > > > > > > > Regarding two conventions - I agree that it may look strange, but I > do > > > not > > > > see any problems from the correctness perspective. Separation of > > logical > > > > and physical planning helps me avoid excessive expansion of the > search > > > > place: I do not want my physical rules to produce new rels from not > > > > optimized logical rels. Do you see any problems with that approach? > > > > > > > > Regards, > > > > Vladimir > > > > > > > > ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan <h.y...@alibaba-inc.com>: > > > > > > > >> Hi Vladimir, > > > >> > > > >> The code in PHYSICAL convention L44 looks weird, I think it always > > > >> returns true. > > > >> > > > >> > > > > > > https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44 > > > >> > > > >> Try this: > > > >> > > > >> fromTraits.containsIfApplicable(Convention.PHYSICAL) > > > >> && toTraits.containsIfApplicable(Convention.PHYSICAL); > > > >> > > > >> > > > >> Adding a AbstractConverter on logical operators is meaningless. > > Calcite > > > >> is mixing the concept of logical and physical together, which is > sad. > > > >> > > > >> BTW, using 2 conventions is not appropriate and wrong. > > > >> > > > >> - Haisheng > > > >> > > > >> ------------------------------------------------------------------ > > > >> 发件人:Vladimir Ozerov<ppoze...@gmail.com> > > > >> 日 期:2019年11月05日 18:02:15 > > > >> 收件人:Haisheng Yuan<h.y...@alibaba-inc.com> > > > >> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org)< > > > >> dev@calcite.apache.org> > > > >> 主 题:Re: Re: Problem with converters and possibly rule matching > > > >> > > > >> Hi Haisheng, > > > >> > > > >> think I already tried something very similar to what you explained, > > but > > > >> it gave not an optimal plan. Please let me describe what I did. I > > would > > > >> appreciate your feedback. > > > >> > > > >> 1) We start with a simple operator tree Root <- Project <- Scan, > where > > > >> the root is a final aggregator in the distributed query engine: > > > >> -> LogicalRoot > > > >> -> LogicalProject > > > >> -> LogicalScan > > > >> > > > >> 2) First, we convert the Root and enforce SINGLETON distribution on > a > > > >> child: > > > >> *-> PhysicalRoot[SINGLETON]* > > > >> * -> Enforcer#1[SINGLETON]* > > > >> -> LogicalProject > > > >> -> LogicalScan > > > >> > > > >> 3) Then the project's rule is invoked. It doesn't know the > > distribution > > > >> of the input, so it requests ANY distribution. Note that we have to > > set > > > ANY > > > >> to the project as well since we do not know the distribution of the > > > input: > > > >> -> PhysicalRoot[SINGLETON] > > > >> -> Enforcer#1[SINGLETON] > > > >> * -> PhysicalProject[ANY]* > > > >> * -> Enforcer#2[ANY]* > > > >> -> LogicalScan > > > >> > > > >> 4) Finally, the physical scan is created and its distribution is > > > >> resolved. Suppose that it is REPLICATED, i.e. the whole result set > is > > > >> located on all nodes. > > > >> -> PhysicalRoot[SINGLETON] > > > >> -> Enforcer#1[SINGLETON] > > > >> -> PhysicalProject[ANY] > > > >> -> Enforcer#2[ANY] > > > >> * -> PhysicalScan[REPLICATED]* > > > >> > > > >> 5) Now as all logical nodes are converted, we start resolving > > enforcers. > > > >> The second one is no-op, since REPLICATED satisfies ANY: > > > >> -> PhysicalRoot[SINGLETON] > > > >> -> Enforcer#1[SINGLETON] > > > >> -> PhysicalProject[ANY] > > > >> -> PhysicalScan[REPLICATED] > > > >> > > > >> 6) But the first enforcer now requires an Exchange, since ANY > doesn't > > > >> satisfy SINGLETON! > > > >> -> PhysicalRoot[SINGLETON] > > > >> * -> SingletonExchange[SINGLETON]* > > > >> -> PhysicalProject[ANY] // <= unresolved! > > > >> -> PhysicalScan[REPLICATED] > > > >> > > > >> The resulting plan requires data movement only because we didn't > know > > > >> precise distribution of the PhysicalProject when it was created. But > > > should > > > >> I enable Convention.Impl.canConvertConvention, bottom-up propagation > > > >> kicks in, and the correct plan is produced because now > LogicalProject > > > >> has a chance to be converted to PhysicalProject with the concrete > > > >> distribution. The optimized plan looks like this (since REPLICATED > > > >> satisfies SINGLETON): > > > >> -> PhysicalRoot[SINGLETON] > > > >> -> PhysicalProject[REPLICATED] > > > >> -> PhysicalScan[REPLICATED] > > > >> > > > >> You may see this in action in my reproducer: > > > >> 1) Test producing "bad" plan: > > > >> > > > > > > https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45 > > > >> 2) Root enforces SINGLETON on Project: > > > >> > > > > > > https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45 > > > >> 3) Project enforces default (ANY) distribution on Scan: > > > >> > > > > > > https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49 > > > >> > > > >> Please let me know if this flow is similar to what you meant. > > > >> > > > >> Regards, > > > >> Vladimir. > > > >> > > > >> пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan <h.y...@alibaba-inc.com > >: > > > >> > > > >>> Hi Vladimir, > > > >>> > > > >>> This is still can be done through top-down request approach. > > > >>> > > > >>> PhysicalFilter operator should request ANY distribution from child > > > >>> operator, unless there is outer reference in the filter condition, > in > > > which > > > >>> case, PhysicalFilter should request SINGLETON or BROADCAST > > > distribution. So > > > >>> in your case, PhysicalFilter request ANY, its required distribution > > > will be > > > >>> enforced on filter's output. > > > >>> > > > >>> Regarding index usage, you should have a FIlterTableScan2IndexGet > > > >>> logical transformation rule, and a IndexGet2IndexScan physical > > > >>> implementation rule. Note that IndexGet is a logical operator and > > > IndexScan > > > >>> is a physical operator, which are also used by SQL Server. > > > >>> > > > >>> - Haisheng > > > >>> > > > >>> ------------------------------------------------------------------ > > > >>> 发件人:Vladimir Ozerov<ppoze...@gmail.com> > > > >>> 日 期:2019年11月01日 17:30:26 > > > >>> 收件人:<dev@calcite.apache.org> > > > >>> 主 题:Re: Problem with converters and possibly rule matching > > > >>> > > > >>> Hi Stamatis, > > > >>> > > > >>> Thank you for your reply. I also thought that we may set the > > > distribution > > > >>> trait during logical planning because it is known in advance. And > the > > > >>> example I gave will work! :-) But unfortunately, it will work only > > > >>> because > > > >>> the tree is very simple, and the Project is adjacent to the Scan. > > This > > > is > > > >>> how my reproducer will work in that case: > > > >>> 1) Root: enforce "SINGLETON" on Project > > > >>> 2) Project: check the logical Scan, infer already resolved > > > distribution, > > > >>> then convert to [PhyiscalProject <- PhysicalScan] > > > >>> 3) Resolve Root enforcer, adding and Exchange if needed. > > > >>> > > > >>> But this stops working as soon as a plan becomes more complex so > that > > > it > > > >>> is > > > >>> impossible to infer the distribution from the child immediately. > > E.g.: > > > >>> LogicalRoot [distribution=SINGLETON] > > > >>> -> LogicalProject // We are here new and cannot produce the > physical > > > >>> project > > > >>> -> LogicalFilter[distribution=?] > > > >>> -> LogicalScan[distribution=REPLICATED] > > > >>> > > > >>> This is where your suggestion with cascading enforcement may kick > in. > > > But > > > >>> now consider that instead of having REPLICATED distribution, which > > > >>> satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't > > > >>> satisfy > > > >>> SINGLETON, so we have to insert the exchange. > > > >>> > > > >>> Before: > > > >>> PhysicalRoot [SINGLETON] > > > >>> -> PhysicalProject [SINGLETON] > > > >>> -> PhysicalFilter [SINGLETON] > > > >>> -> LogicalScan [PARTITIONED] // SINGLETON is enforced here > > > >>> > > > >>> After: > > > >>> PhysicalRoot [SINGLETON] > > > >>> -> PhysicalProject [SINGLETON] > > > >>> -> PhysicalFilter [SINGLETON] > > > >>> -> SingletonExchange [SINGLETON] > > > >>> -> PhysicalScan [PARTITIONED] > > > >>> > > > >>> But unfortunately, this plan is not optimal. Since we perform > Project > > > and > > > >>> Filter after the exchange, we now have to reshuffle the whole > table. > > > The > > > >>> optimal plan would be to pushdown Project and Filter past exchange: > > > >>> PhysicalRoot [SINGLETON] > > > >>> -> SingletonExchange [SINGLETON] > > > >>> -> PhysicalProject [PARTITIONED] > > > >>> -> PhysicalFilter [PARTITIONED] > > > >>> -> PhysicalScan [PARTITIONED] > > > >>> > > > >>> But in order to achieve that, now I need to introduce new rules > which > > > >>> will > > > >>> attempt to transpose dozens of different operators past exchange, > so > > > most > > > >>> likely the planning will never finish :-) > > > >>> > > > >>> This is why it seems that the bottom-up approach seems to be more > > > natural > > > >>> for such cases. Another example is index support. A table may have > a > > > >>> number > > > >>> of access methods in addition to plain scan, and every such method > > may > > > >>> produce a physical scan with different collations. It is not > > practical > > > to > > > >>> try to enforce something from the top. Instead, we'd better produce > > > >>> alternatives from the bottom. > > > >>> > > > >>> Basically, Drill tries to mix two approaches. First, it tries to > > derive > > > >>> traits from the child. If it fails and no transformations were > > created, > > > >>> it > > > >>> just stops trait propagation. As a result, an exchange is created > on > > > top > > > >>> of > > > >>> the operator where we stopped, which again leads to not optimal > > plans. > > > >>> You > > > >>> may observe this in action if you run my reproducer. "testPass" > > > produces > > > >>> an > > > >>> optimal plan with help of abstract converters. "testDrill" produces > > not > > > >>> optimal plan with the unnecessary exchange. > > > >>> > > > >>> All in all, I cannot get how we can employ distribution metadata > and > > > >>> index > > > >>> metadata for optimal planning without using a pure bottom-up > > approach. > > > >>> > > > >>> Regards, > > > >>> Vladimir. > > > >>> > > > >>> пт, 1 нояб. 2019 г. в 11:42, Stamatis Zampetakis < > zabe...@gmail.com > > >: > > > >>> > > > >>> > The discussion is interesting thanks for the nice examples. > > > >>> > I am replying in this thread since it has all the examples and > the > > > >>> history > > > >>> > of the conversation. > > > >>> > > > > >>> > *Part I: Distribution physical or logical property* > > > >>> > > > > >>> > The Volcano paper writes the following regarding properties: > > > >>> > > > > >>> > "Logical properties can be derived from the logical algebra > > > expression > > > >>> and > > > >>> > include schema, expected size, etc., while physical properties > > depend > > > >>> on > > > >>> > algorithms, e.g., sort order, partitioning, etc." > > > >>> > > > > >>> > We could say that partitioning is not only a property which > depends > > > on > > > >>> an > > > >>> > algorithm but a property which depends on the schema. When you > > have a > > > >>> table > > > >>> > you know in advance that the particular table is replicated. > > > >>> > > > > >>> > Basically, this means that the starting "logical" plan in your > case > > > >>> could > > > >>> > be the following: > > > >>> > RootLogicalRel[convention=LOGICAL, distribution=ANY] > > > >>> > -> ProjectLogicalRel[convention=LOGICAL, distribution=ANY] > > > >>> > -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED] > > > >>> > > > > >>> > When you are about to create your physical projection it is not > > > >>> necessary > > > >>> > to create three equivalent paths but the most promising by asking > > the > > > >>> input > > > >>> > what can it offer as traits. Again this partially overlaps with > the > > > >>> > discussion about "On demand traitset" [1] where as I wrote there > is > > > >>> already > > > >>> > a mechanism to derive traits from children (collation, > > distribution, > > > >>> etc.). > > > >>> > > > > >>> > Note that I am not saying that there is nothing more to be done > to > > > >>> improve > > > >>> > the Calcite optimizer; I am just trying to provide a viable > > > alternative > > > >>> > given the current situation of the project. > > > >>> > > > > >>> > *Part II: Example with the original Volcano optimizer* > > > >>> > > > > >>> > Now let me try to show how the original (from the paper) Volcano > > > >>> optimizer > > > >>> > could handle your query. > > > >>> > > > > >>> > The optimization starts and we request two properties convention > > and > > > >>> > distribution. > > > >>> > > > > >>> > RootLogicalRel[convention=PHYSICAL, distribution=SINGLETON] > > > >>> > -> ProjectLogicalRel > > > >>> > -> MapScanLogicalRel > > > >>> > > > > >>> > Rule 1: RootLogicalRel -> RootPhysicalRel applies the algorithm > but > > > >>> cannot > > > >>> > satisfy the SINGLETON property so it propagates the demand > > > >>> > > > > >>> > RootPhysicalRel > > > >>> > -> ProjectLogicalRel [convention=PHYSICAL, > distribution=SINGLETON] > > > >>> > -> MapScanLogicalRel > > > >>> > > > > >>> > Rule 2: ProjectLogicalRel -> ProjectPhysicalRel applies the > > algorithm > > > >>> but > > > >>> > cannot satisfy the SINGLETON property so it propagates the demand > > > >>> > > > > >>> > RootPhysicalRel > > > >>> > -> ProjectPhysicalRel > > > >>> > -> MapScanLogicalRel [convention=PHYSICAL, > distribution=SINGLETON] > > > >>> > > > > >>> > Rule 3: MapScanLogicalRel -> MapScanPhysicalRel applies the > > algorithm > > > >>> and > > > >>> > given that the table is replicated it satisfies the distribution > > > >>> property. > > > >>> > > > > >>> > RootPhysicalRel > > > >>> > -> ProjectPhysicalRel > > > >>> > -> MapScanPhysicalRel > > > >>> > > > > >>> > So we end up with a complete plan that satisfies all properties > and > > > >>> does > > > >>> > not have redundant exchange operators. > > > >>> > Moreover we didn't require all possible distributions when we > > applied > > > >>> Rule > > > >>> > 2 since we just want to satisfy the SINGLETON property requested > by > > > the > > > >>> > parent. > > > >>> > The example above does not demonstrate the additional > optimization > > > >>> paths > > > >>> > which would be apply an enforcer (an Exchange operator to satisfy > > the > > > >>> > SINGLETON property) at a higher level than the scan. > > > >>> > > > > >>> > Would this be an acceptable approach? Does it still suffer from a > > big > > > >>> > search space? > > > >>> > > > > >>> > To make the analog with the actual Volcano optimizer in Calcite > the > > > >>> thing > > > >>> > that may be missing is to be able to know what trait was > requested > > by > > > >>> the > > > >>> > parent during the application of Rule 2. > > > >>> > > > > >>> > Best, > > > >>> > Stamatis > > > >>> > > > > >>> > [1] > > > >>> > > > > >>> > > > > >>> > > > > > > https://lists.apache.org/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E > > > >>> > > > > >>> > On Wed, Oct 30, 2019 at 7:24 PM Vladimir Ozerov < > > ppoze...@gmail.com> > > > >>> > wrote: > > > >>> > > > > >>> > > Hi Stamatis, > > > >>> > > > > > >>> > > The problem that the presented reproducer is a very simplified > > > >>> version of > > > >>> > > what is actually needed. In reality, there are several > > distribution > > > >>> > types - > > > >>> > > PARTITIONED, REPLICATED, SINGLETON. To make things worse, > > > >>> PARTITIONED may > > > >>> > > or may not have concrete distribution fields. In theory, I can > > > >>> create one > > > >>> > > transformation per distribution type, but that would increase > > plan > > > >>> space > > > >>> > > significantly. In my sample "Root <- Project <- Scan" plan, > there > > > is > > > >>> no > > > >>> > > room for optimization at all, we only need to convert the nodes > > and > > > >>> > > propagate traits. But if I follow the proposed approach, the > > > planner > > > >>> > would > > > >>> > > create three equivalent paths. For complex queries, this may > > > increase > > > >>> > > optimization time significantly. > > > >>> > > > > > >>> > > What I need instead is to gradually convert and optimize nodes > > > >>> > *bottom-up*, > > > >>> > > instead of top-bottom. That is, create a physical scan first, > > then > > > >>> > create a > > > >>> > > physical project on top of it, etc. But at the same time, some > > > rules > > > >>> > still > > > >>> > > require the top-bottom approach. So essentially I need the > > > optimizer > > > >>> to > > > >>> > do > > > >>> > > both. Abstract converters help me establish bottom-up > preparation > > > >>> but do > > > >>> > > this at the cost of considering too many trait pairs, and as a > > > >>> result, > > > >>> > also > > > >>> > > pollute the search space. > > > >>> > > > > > >>> > > To the contrast, precise command "I transformed the node A to > > node > > > B, > > > >>> > > please re-trigger the rules for A's parents" would allow us to > > > >>> re-trigger > > > >>> > > only required rules, without adding more nodes. > > > >>> > > > > > >>> > > Does it make sense? > > > >>> > > > > > >>> > > Regards, > > > >>> > > Vladimir. > > > >>> > > > > > >>> > > ср, 30 окт. 2019 г. в 21:02, Stamatis Zampetakis < > > > zabe...@gmail.com > > > >>> >: > > > >>> > > > > > >>> > > > I admit that I didn't thoroughly read the exchanges so far > but > > > >>> forcing > > > >>> > > the > > > >>> > > > rules trigger does not seem the right approach. > > > >>> > > > > > > >>> > > > Other than that I would like to backtrack a bit to point 4.3 > > > >>> raised by > > > >>> > > > Vladimir. > > > >>> > > > > > > >>> > > > "ProjectPhysicalRule [6] - transforms logical project to > > physical > > > >>> > > > project *ONLY* if there is an underlying physical input with > > > >>> REPLICATED > > > >>> > > or > > > >>> > > > SINGLETON distribution" > > > >>> > > > > > > >>> > > > The rule could be modified to do the following two > > > transformations: > > > >>> > > > 1. Create a physical project and require the input to be > > > >>> REPLICATED. > > > >>> > > > 2. Create a physical project and require > > > >>> > > > the input to be SINGLETON. > > > >>> > > > > > > >>> > > > I would assume that afterwards when your scan rule fires it > > > should > > > >>> go > > > >>> > to > > > >>> > > > the appropriate subset and give you back the desired plan. Is > > > >>> there a > > > >>> > > > problem with this approach? > > > >>> > > > > > > >>> > > > Best, > > > >>> > > > Stamatis > > > >>> > > > > > > >>> > > > On Wed, Oct 30, 2019, 5:52 PM Seliverstov Igor < > > > >>> gvvinbl...@gmail.com> > > > >>> > > > wrote: > > > >>> > > > > > > >>> > > > > Unfortunately it requires package-private API usage. > > > >>> > > > > > > > >>> > > > > Best solution there if Calcite provide it for end users. > > > >>> > > > > > > > >>> > > > > ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov < > > ppoze...@gmail.com > > > >: > > > >>> > > > > > > > >>> > > > > > “e pension” == “expand conversion” :) > > > >>> > > > > > > > > >>> > > > > > ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov < > > > >>> ppoze...@gmail.com>: > > > >>> > > > > > > > > >>> > > > > > > Yes, that may work. Even if e pension rule is used, for > > the > > > >>> most > > > >>> > > > cases > > > >>> > > > > it > > > >>> > > > > > > will not trigger any real conversions, since we are > > moving > > > >>> from > > > >>> > > > > abstract > > > >>> > > > > > > convention to physical, and created converters will > have > > > the > > > >>> > > opposite > > > >>> > > > > > trait > > > >>> > > > > > > direction (from physical to abstract). > > > >>> > > > > > > > > > >>> > > > > > > But again - ideally we only need to re-trigger the > rules > > > for > > > >>> a > > > >>> > > > specific > > > >>> > > > > > > node, no more than that. So API support like > > > >>> > > > > > > “VolcanoPlanner.forceRules(RelNode)” would be very > > > >>> convenient. > > > >>> > > > > > > > > > >>> > > > > > > What do you think? > > > >>> > > > > > > > > > >>> > > > > > > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor < > > > >>> > > gvvinbl...@gmail.com > > > >>> > > > >: > > > >>> > > > > > > > > > >>> > > > > > >> I considered manual rules calling too, for now we use > > > >>> abstract > > > >>> > > > > > converters > > > >>> > > > > > >> + > > > >>> > > > > > >> ExpandConversionRule for exchanges producing. > > > >>> > > > > > >> > > > >>> > > > > > >> You may create such converters manually (checking > > > >>> appropriate > > > >>> > > > subset) > > > >>> > > > > > this > > > >>> > > > > > >> case you may reduce created converters count, also, a > > > >>> converter > > > >>> > > is a > > > >>> > > > > > quite > > > >>> > > > > > >> special node, that does almost nothing (without > > > >>> corresponding > > > >>> > > rule) > > > >>> > > > it > > > >>> > > > > > may > > > >>> > > > > > >> be used just as a rule trigger. > > > >>> > > > > > >> > > > >>> > > > > > >> Regards, > > > >>> > > > > > >> Igor > > > >>> > > > > > >> > > > >>> > > > > > >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov < > > > >>> ppoze...@gmail.com > > > >>> > >: > > > >>> > > > > > >> > > > >>> > > > > > >> > One funny hack which helped me is manual > registration > > > of a > > > >>> > fake > > > >>> > > > > > RelNode > > > >>> > > > > > >> > with desired traits through > VolcanoPlanner.register() > > > >>> method. > > > >>> > > But > > > >>> > > > > > again, > > > >>> > > > > > >> > this leads to trashing. What could really help is a > > call > > > >>> to > > > >>> > > > > > >> > VolcanoPlanner.fireRules() with desired rel. But > this > > > >>> doesn't > > > >>> > > work > > > >>> > > > > out > > > >>> > > > > > >> of > > > >>> > > > > > >> > the box since some internals of the rule queue needs > > to > > > be > > > >>> > > > adjusted. > > > >>> > > > > > >> > > > > >>> > > > > > >> > What does the community think about adding a method > > > which > > > >>> will > > > >>> > > > > re-add > > > >>> > > > > > >> rules > > > >>> > > > > > >> > applicable to the specific RelNode to the rule > queue? > > > >>> > > > > > >> > > > > >>> > > > > > >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov < > > > >>> > > ppoze...@gmail.com > > > >>> > > > >: > > > >>> > > > > > >> > > > > >>> > > > > > >> > > Hi Igor, > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > Yes, I came to the same conclusion, thank you. > This > > is > > > >>> how > > > >>> > it > > > >>> > > > > > >> basically > > > >>> > > > > > >> > > happens when converters are disabled: > > > >>> > > > > > >> > > 1) We start with initial tree: [LogicalProject] <- > > > >>> > > [LogicalScan] > > > >>> > > > > > >> > > 2) Then we convert LogicalScan to PhysicalScan, so > > it > > > is > > > >>> > added > > > >>> > > > to > > > >>> > > > > > the > > > >>> > > > > > >> > > set: [LogicalProject] <- [LogicalScan, > PhysicalScan] > > > >>> > > > > > >> > > 3) Finally, when it is time to fire a rule for > > > >>> PhysicalScan, > > > >>> > > we > > > >>> > > > > try > > > >>> > > > > > to > > > >>> > > > > > >> > get > > > >>> > > > > > >> > > parents of that scan set with traits of the > > > >>> PhysicalScan. > > > >>> > > Since > > > >>> > > > > > there > > > >>> > > > > > >> are > > > >>> > > > > > >> > > no such parents (we skipped it intentionally), the > > > rule > > > >>> is > > > >>> > not > > > >>> > > > > > queued. > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > But when converters are enabled, a converter rel > is > > > >>> created: > > > >>> > > > > > >> > [LogicalProject] > > > >>> > > > > > >> > > <- [LogicalScan, PhysicalScan, > > > >>> > > ConverterFromPhysicalToLogical]. > > > >>> > > > No > > > >>> > > > > > >> rules > > > >>> > > > > > >> > > are fired for PhysicalScan again, but they are > fired > > > for > > > >>> > > > converter > > > >>> > > > > > >> since > > > >>> > > > > > >> > > it has the necessary LOGICAL trait. > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > It makes sense, that converters essentially allow > > > >>> forcing > > > >>> > rule > > > >>> > > > > > >> invocation > > > >>> > > > > > >> > > on parents, even if the child was created with > > > different > > > >>> > > traits. > > > >>> > > > > But > > > >>> > > > > > >> it > > > >>> > > > > > >> > > seems that this mechanism may be too heavy for > > complex > > > >>> > queries > > > >>> > > > > > >> because it > > > >>> > > > > > >> > > literally creates hundreds of new converter rels > and > > > >>> > triggers > > > >>> > > > > rules > > > >>> > > > > > >> over > > > >>> > > > > > >> > > and over again. > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > We need some fine-grained alternative. Basically, > > what > > > >>> would > > > >>> > > be > > > >>> > > > > > enough > > > >>> > > > > > >> > for > > > >>> > > > > > >> > > me is to let the planner know somehow: "I created > > that > > > >>> rel, > > > >>> > > and > > > >>> > > > I > > > >>> > > > > > want > > > >>> > > > > > >> > you > > > >>> > > > > > >> > > to execute parent rules not only using its trait > but > > > >>> also on > > > >>> > > > this > > > >>> > > > > > and > > > >>> > > > > > >> > those > > > >>> > > > > > >> > > traits." > > > >>> > > > > > >> > > Is there any API in Calcite which allows doing > this > > > >>> without > > > >>> > > > > > creating a > > > >>> > > > > > >> > new > > > >>> > > > > > >> > > rel node? > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > Regards, > > > >>> > > > > > >> > > Vladimir. > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor < > > > >>> > > > > gvvinbl...@gmail.com > > > >>> > > > > > >: > > > >>> > > > > > >> > > > > > >>> > > > > > >> > >> Vladimir, > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> Probably it'll help you: > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> Seems the cause of issue in > > RelSubset.getParentRels() > > > >>> The > > > >>> > > > check > > > >>> > > > > > used > > > >>> > > > > > >> > when > > > >>> > > > > > >> > >> the planner schedules newly matched rules after > > > >>> successful > > > >>> > > > > > >> > transformation > > > >>> > > > > > >> > >> (see VolcanoRuleCall.matchRecurse), it prevents > the > > > >>> parent > > > >>> > > rule > > > >>> > > > > be > > > >>> > > > > > >> > applied > > > >>> > > > > > >> > >> once again (here your logical project with an > input > > > >>> having > > > >>> > > ANY > > > >>> > > > > > >> > >> distribution > > > >>> > > > > > >> > >> doesn't satisfy a transformed input traits). > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> In our case we use another workaround, so there > are > > > >>> also > > > >>> > much > > > >>> > > > > more > > > >>> > > > > > >> > >> transformations than we wanted, so the desired > rule > > > is > > > >>> > > > triggered. > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov < > > > >>> > > ppoze...@gmail.com > > > >>> > > > >: > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > >> > Hi Vladimir, > > > >>> > > > > > >> > >> > > > > >>> > > > > > >> > >> > I am sorry. Pushed, it works now. > > > >>> > > > > > >> > >> > > > > >>> > > > > > >> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov > < > > > >>> > > > > > >> > >> > sitnikov.vladi...@gmail.com > > > >>> > > > > > >> > >> > >: > > > >>> > > > > > >> > >> > > > > >>> > > > > > >> > >> > > > mvn clean test > > > >>> > > > > > >> > >> > > > > > >>> > > > > > >> > >> > > [ERROR] The goal you specified requires a > > project > > > >>> to > > > >>> > > > execute > > > >>> > > > > > but > > > >>> > > > > > >> > >> there is > > > >>> > > > > > >> > >> > > no POM in this directory > > > >>> > > > > > >> > >> > > > > > >>> > > > > > >> > >> > > Vladimir, please push missing files > > > >>> > > > > > >> > >> > > > > > >>> > > > > > >> > >> > > Vladimir > > > >>> > > > > > >> > >> > > > > > >>> > > > > > >> > >> > > > > >>> > > > > > >> > >> > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > > > >>> > > > > > >> > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >> > > > > > >