Hello everyone, @Vladimir, +1 on the change introducing "enforceDerive()".
@Roman, could you walk us through the limitations you found that forced you to copy-paste the whole class? Maybe there is some middle ground for your problem(s) too, similar in spirit to what Vladimir proposed for the other limitation. I am not against making the class more public if necessary, but it would be nice to have a discussion here before going down that path. If the discussion leads to a better design of the original class, all projects would benefit from that. Best regards, Alessandro On Fri, 11 Feb 2022 at 04:14, Roman Kondakov <kondako...@mail.ru.invalid> wrote: > Hi Vladimir, > > +1 for making the rule driver more public. We've faced similar problems > in the downstream project. The solution was to copy and paste the > TopDownRuleDrive code with small fixes since it was not possible to > override the default behavior. > > -- > Roman Kondakov > > > On 11.02.2022 02:50, Vladimir Ozerov wrote: > > Hi, > > > > In the Cascades driver, it is possible to propagate the requests top-down > > using the "passThrough", method and then notify parents bottom-up about > the > > concrete physical implementations of inputs using the "derive" method. > > > > In some optimizers, the valid parent node cannot be created before the > > trait sets of inputs are known. An example is a custom distribution trait > > that includes the number of shards in the system. The parent operator > alone > > may guess the distribution keys, but cannot know the number of input > > shards. To mitigate this, you may create a "template" node with an > infinite > > cost from within the optimization rule that will propagate the > > passThrough/drive calls but would never participate in the final plan. > > > > Currency, the top-down driver designed in a way that the nodes created > from > > the "passThrough" method are not notified on the "derive" stage. This > leads > > to the incomplete exploration of the search space. For example, the rule > > may produce the node "A1.template" that will be converted into a normal > > "A1" node in the derive phase. However, if the parent operator produced > > "A2.template" from "A1.template" using pass-through mechanics, the > > "A2.template" will never be notified about the concrete input traits, > > possibly losing the optimal plan. This is especially painful in > distributed > > engines, where the number of shards is important for the placement of > > Shuffle operators. > > > > It seems that the problem could be solved with relatively low effort. The > > "derive" is not invoked on the nodes created from the "passThrough" > method, > > because such nodes are placed in the "passThroughCache" collection. > Instead > > of doing this unconditionally, we may introduce an additional predicate > > that would selectively enforce "derive" on such nodes. For example, this > > could be a default method in the PhysicalNode interface, like: > > > > interface PhysicalNode { > > default boolean enforceDerive() { return false; } > > } > > > > If there are no objections, I'll proceed with this change. > > > > Alternatively, we may make the TopDownRuleDriver more "public", so that > the > > user can extend it and decide within the driver whether to cache a > > particular node or not. > > > > I would appreciate your feedback on the matter. > > > > Regards, > > Vladimir. > > >