Hi, In the Cascades driver, it is possible to propagate the requests top-down using the "passThrough", method and then notify parents bottom-up about the concrete physical implementations of inputs using the "derive" method.
In some optimizers, the valid parent node cannot be created before the trait sets of inputs are known. An example is a custom distribution trait that includes the number of shards in the system. The parent operator alone may guess the distribution keys, but cannot know the number of input shards. To mitigate this, you may create a "template" node with an infinite cost from within the optimization rule that will propagate the passThrough/drive calls but would never participate in the final plan. Currency, the top-down driver designed in a way that the nodes created from the "passThrough" method are not notified on the "derive" stage. This leads to the incomplete exploration of the search space. For example, the rule may produce the node "A1.template" that will be converted into a normal "A1" node in the derive phase. However, if the parent operator produced "A2.template" from "A1.template" using pass-through mechanics, the "A2.template" will never be notified about the concrete input traits, possibly losing the optimal plan. This is especially painful in distributed engines, where the number of shards is important for the placement of Shuffle operators. It seems that the problem could be solved with relatively low effort. The "derive" is not invoked on the nodes created from the "passThrough" method, because such nodes are placed in the "passThroughCache" collection. Instead of doing this unconditionally, we may introduce an additional predicate that would selectively enforce "derive" on such nodes. For example, this could be a default method in the PhysicalNode interface, like: interface PhysicalNode { default boolean enforceDerive() { return false; } } If there are no objections, I'll proceed with this change. Alternatively, we may make the TopDownRuleDriver more "public", so that the user can extend it and decide within the driver whether to cache a particular node or not. I would appreciate your feedback on the matter. Regards, Vladimir.