Hi Vladimir, If I understand it correctly, your concern is still about [1], right? If it is, I think I have answered it in [2], perhaps it didn't jump into your inbox.
I am still not convinced that enforcing derivation on all rels is the only way to solve your concern. If it is the other problem, can you give us a concrete example? So we can discuss and solve it. Thanks, Haisheng Yuan [1] https://lists.apache.org/thread/4rcvlk1oprbnbgbnwl2s735p99h4vj80 [2] https://lists.apache.org/thread/s17tsj3pmccfrfvydnmv76btc3voxohj On 2022/02/13 09:40:05 Roman Kondakov wrote: > Hi Alessandro, > > this problem was already discussed on dev-list [1] and we have a ticket > for this [2]. > > My concern is that many projects use Calcite as a Lego kit: they took > internal components of Calcite and combine them for building a custom > planning and execution pipeline. And sometimes downstream projects need > to change the default behavior of internal components to fit their > requirements or overcome the bug. So the idea of keeping even internal > components of Calcite "more public" is rather a good thing than the bad > one from my point of view. > > Thank you. > > [1] https://lists.apache.org/thread/cykl74dcphgow4790fwoc8frsjglz7n1 > > [2] https://issues.apache.org/jira/browse/CALCITE-4542 > > -- > Roman Kondakov > > > On 11.02.2022 19:15, Alessandro Solimando wrote: > > Hello everyone, > > @Vladimir, +1 on the change introducing "enforceDerive()". > > > > @Roman, could you walk us through the limitations you found that forced you > > to copy-paste the whole class? > > > > Maybe there is some middle ground for your problem(s) too, similar in > > spirit to what Vladimir proposed for the other limitation. > > > > I am not against making the class more public if necessary, but it would be > > nice to have a discussion here before going down that path. > > If the discussion leads to a better design of the original class, all > > projects would benefit from that. > > > > Best regards, > > Alessandro > > > > On Fri, 11 Feb 2022 at 04:14, Roman Kondakov <kondako...@mail.ru.invalid> > > wrote: > > > >> Hi Vladimir, > >> > >> +1 for making the rule driver more public. We've faced similar problems > >> in the downstream project. The solution was to copy and paste the > >> TopDownRuleDrive code with small fixes since it was not possible to > >> override the default behavior. > >> > >> -- > >> Roman Kondakov > >> > >> > >> On 11.02.2022 02:50, Vladimir Ozerov wrote: > >>> Hi, > >>> > >>> In the Cascades driver, it is possible to propagate the requests top-down > >>> using the "passThrough", method and then notify parents bottom-up about > >> the > >>> concrete physical implementations of inputs using the "derive" method. > >>> > >>> In some optimizers, the valid parent node cannot be created before the > >>> trait sets of inputs are known. An example is a custom distribution trait > >>> that includes the number of shards in the system. The parent operator > >> alone > >>> may guess the distribution keys, but cannot know the number of input > >>> shards. To mitigate this, you may create a "template" node with an > >> infinite > >>> cost from within the optimization rule that will propagate the > >>> passThrough/drive calls but would never participate in the final plan. > >>> > >>> Currency, the top-down driver designed in a way that the nodes created > >> from > >>> the "passThrough" method are not notified on the "derive" stage. This > >> leads > >>> to the incomplete exploration of the search space. For example, the rule > >>> may produce the node "A1.template" that will be converted into a normal > >>> "A1" node in the derive phase. However, if the parent operator produced > >>> "A2.template" from "A1.template" using pass-through mechanics, the > >>> "A2.template" will never be notified about the concrete input traits, > >>> possibly losing the optimal plan. This is especially painful in > >> distributed > >>> engines, where the number of shards is important for the placement of > >>> Shuffle operators. > >>> > >>> It seems that the problem could be solved with relatively low effort. The > >>> "derive" is not invoked on the nodes created from the "passThrough" > >> method, > >>> because such nodes are placed in the "passThroughCache" collection. > >> Instead > >>> of doing this unconditionally, we may introduce an additional predicate > >>> that would selectively enforce "derive" on such nodes. For example, this > >>> could be a default method in the PhysicalNode interface, like: > >>> > >>> interface PhysicalNode { > >>> default boolean enforceDerive() { return false; } > >>> } > >>> > >>> If there are no objections, I'll proceed with this change. > >>> > >>> Alternatively, we may make the TopDownRuleDriver more "public", so that > >> the > >>> user can extend it and decide within the driver whether to cache a > >>> particular node or not. > >>> > >>> I would appreciate your feedback on the matter. > >>> > >>> Regards, > >>> Vladimir. > >>> >