Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(11 comments)

Still working on a minor test fix, sending out to continue CR.

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
> Might be worth commenting on how this handles > 2 disjuncts. I.e. you need 
Expanded comments on ExprRewriter and modified comment here. 

(I don't think we should document how the framework works in every rule.)


Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule {
> where is this being applied?
At the end of analysis, made those changes (had forgotten this after 
rebasing/cleaning)


Line 42:     if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr;
> Isn't this check a little weak in the sense that it only checks the root's 
Having separate rules for normalization certainly makes sense, but I see them 
as being independent of this change. I don't think we'd change the 
code/behavior of the current rule even if we had normalization.

In your example, we would not return 'a' because the single conjunct of child0 
would be (a OR b) which is not common to any of the child1 conjuncts "a" and 
"c".

Maybe you can come up with another example that breaks?


Line 49:       if (child1Conjuncts.contains(conjunct)) {
> This is n^2 so will get very slow if we have long lists of disjuncts, e.g. 
Agree that this is problematic. Unfortunately, also non-trivial to fix and make 
sure the hashCode() and equals() functions are correct.

I added an arbitrary threshold on the maximum number of Expr.equals() 
comparisons to bail on these pathological cases.

It's also not clear whether this rule would be much worse than the codegen in 
the BE. Not a good argument to avoid the improvement, but the improvement here 
alone might not make a material impact for the overall query.


Line 72:     if (!child0Conjuncts.isEmpty()) {
> At this point both child0Conjuncts and child1Conjuncts are non-empty becaus
Right, forgot to correct after some cleanup. Done.


PS1, Line 84: newDisjuncts.size() > 1
> See above - this should always be true.
You are right, thanks. Done.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> Slightly tangential, but should we apply De Morgan's first to normalise the
Completely agree about having additional rules for normalization. I view those 
as somewhat independent of the current rule. I don't think the code/approach of 
the current rule would change much even if we had normalization. Of course, 
normalizing would make the current rule useful in more cases.

Let's continue this train of thought and come up with useful normalization 
rules in a separate change set. Definitely needed.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
> Does common conjunct extraction work if you have a BETWEEN expression and t
Good point, there was a bug. Fixed bug and added tests.


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
> i agree, i think we need normalization as a separate transformation step. t
Agree.


Line 3384:       p_brand = 'Brand#12'
> I'm not really sure how this patch handles a bushy predicate. Do we flatten
Agree we need normalization as a separate set of rules.

I'm not sure normalization is required for your specific example though. The 
interesting conjuncts are those that bubble up all the way to the top so that, 
e.g., we can assign them as scan predicates. I think the current implementation 
does that, although some opportunities for simplifying disjunctive 
subexpressions are missed.


Line 3393:       p_brand = 'Brand#23'
> additional idea for transformation rule: derive additional, non-essential d
Agree. Sounds like IMPALA-2108.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to