I like the idea of redundant filters. We should probably add a new kind of metadata so that one node can see that a filter is redundant. I’m not sure how that metadata would look. A well chosen use case will help us design it. Does anyone have one?
> On Sep 27, 2023, at 1:40 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote: > > Hey Ian, > > I don't think there is such a rule in Calcite but you may find similar > ideas in rules present in other projects. > > In Hive for instance, there is the HivePreFilteringRule [1, 2] that > pushes "redundantly" some filters below other operations in the tree. > What is inherently problematic with all these rules that introduce > "duplicate" predicates in the plan is that they require some kind of > state to prevent infinite matching. Additionally, you need to pay > attention when you put together rules that are pushing and pulling > since they can interact badly with each other. > > Best, > Stamatis > > [1] > https://github.com/apache/hive/blob/57f096c9a73eb92806f2a7cc97f87fabf5d546fe/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePreFilteringRule.java > [2] > https://issues.apache.org/jira/browse/HIVE-9069?focusedCommentId=14534098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14534098 > >> On Tue, Sep 26, 2023 at 10:10 PM Ian Bertolacci >> <ian.bertola...@workday.com.invalid> wrote: >> >> Hi, >> I was wondering if there exist any rules to duplicate filters which exist >> above the join, whose effect is dependent on the result of the join and >> therefore cannot be *pushed* below a join, but could be *duplicated* below >> the join. >> >> For example: `select … from A LEFT join B on … where B.field is null` >> Here, the best we could do is push the filter condition into the join >> condition, but not necessarily below it, because the null-ness of the column >> is partially dependent on the result of the join. >> However, in this case we can duplicate the condition below the join: >> `select … from A LEFT join (select … from B where B.field is null) as B on … >> where B.field is null` >> This is because the condition because the null-ness of the column is also >> partially dependent value of the column. >> With both of these filters in place we capture instances of B which are null >> because the column is null and because there was no match to B >> This (1) reduced the cardinality of that side of the join, and (2) >> maintained the original intent of the query. >> >> In this example, I use `is null` but we would like to do this for some of >> our custom comparison operators. >> For these operators, we cannot do push-down (because it would change the >> intent of the original query) but doing filter duplication should be fine >> (though we’re still making sure of that). >> >> I figure that this probably doesn’t exist, in which case I’ll probably use >> FilterIntoJoinRule as a jumping off point. >> Any other suggestions? >> >> Thanks! >> -Ian J. Bertolacci