asolimando commented on a change in pull request #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r798411704
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
}
// We need to filter i) those that have been pushed already as stored in
the join,
- // and ii) those that were already in the subtree rooted at child
- ImmutableList<RexNode> toPush =
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
- child, valids);
- return toPush;
+ // ii) those that were already in the subtree rooted at child,
+ // iii) predicates that are not safe for transitive inference.
+ //
+ // There is no formal definition of safety for predicate inference, only
an empirical one.
Review comment:
The change does not reinforce the dependency on the registry. If you
check the call, we are filtering candidate predicates coming from
`HiveRelMdPredicates`, which is not linked to the registry at all. The
predicates in the registry are used to add another filtering step (to remove
"duplicates"), but the two steps are totally independent.
This said, I agree that keeping a global state is a problem, and should be
avoided. The current implementation is also faulty under some respects.
The motivation for storing predicates in the registry is given in
[HIVE-12478](https://issues.apache.org/jira/browse/HIVE-12478?focusedCommentId=15047639&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15047639).
Since predicates can be modified (pushed past project/setOp/joins/aggregate,
they can be simplified or merged), we can try to push them transitively as it
happens in this OOM cases.
But, as I was saying, the current implementation is faulty, since even the
registry is not enough, because most of the other rules they don't know about
it, they will alter the predicates, but they won't register them into the
registry. Making all those rules aware of the registry would increase the
coupling, hence not an option IMO.
The registry main use was to prevent firing the same rule against the same
rel multiple times, but even this use is not complete, since rels are modified
along the way, and we don't update the registry accordingly. For instance, a
join condition is updated, the new join has a fresh rel-id, the registry will
try to re-apply `HiveJoinPushTransitivePredicatesRule`, and if the predicates
were modified too, this will also escape the duplicate check based on the
`registry.getPushedPredicates(join)`, and we can have a loop).
On the other hand, I don't see how we can prevent such loops without storing
a global state. But for this global state to work, each and every rule must be
aware of it, which is totally not doable IMO, which is also the line in Calcite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]