Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15047 )
Change subject: IMPALA-8361: Fix bound predicates optimization doesn't work for InlineView ...................................................................... Patch Set 4: (7 comments) Thanks for digging into this! If I understand this solution correctly, the key change of this patch is inside SingleNodePlanner#migrateConjunctsToInlineView(). Before this change, we skip predicate b.upper_val='1' since canEvalPredicate() returns false on it. Because it doesn't satisfy isLastOjMaterializedByTupleIds(). So we can't migrate it inside the inline view. However, we can be more aggressive. It's correct to duplicate(not migrate) this predicate inside the inline view since it's not evaluted to true with null slots. This patch picks up the predicates that being skipped due to not satisfying isLastOjMaterializedByTupleIds() and duplicates them if they satisfy !isTrueWithNullSlots(). I left some comments for refactoring. Hope we can make this part clearer. http://gerrit.cloudera.org:8080/#/c/15047/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15047/4//COMMIT_MSG@7 PS4, Line 7: Fix bound predicates optimization doesn't work for : InlineView This is an improvement rather than a fix. I think this is better: "Propagate predicates of outer-joined InlineView". http://gerrit.cloudera.org:8080/#/c/15047/4//COMMIT_MSG@10 PS4, Line 10: This is improvement that migrated predicates of the nullable side of : the outer join into inline view.We should duplicate it and evaluate : after join. Not all the predicates of the nullable side could be duplicated into the inline view, e.g. "b.upper_val is null" can only be evaluated at the JOIN node. Hope you can explain your change in more details. http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1623 PS4, Line 1623: if (isAntiJoinedConjunct(e)) return canEvalAntiJoinedConjunct(e, tupleIds); : : if (isIjConjunct(e) || isSjConjunct(e)) { : if (!containsOuterJoinedTid(tids)) return true; : // If the predicate references an outer-joined tuple, then evaluate it at : // the join that the On-clause belongs to. : TableRef onClauseTableRef = null; : if (isIjConjunct(e)) { : onClauseTableRef = globalState_.ijClauseByConjunct.get(e.getId()); : } else { : onClauseTableRef = globalState_.sjClauseByConjunct.get(e.getId()); : } : Preconditions.checkNotNull(onClauseTableRef); : return tupleIds.containsAll(onClauseTableRef.getAllTableRefIds()); : } : : if (isFullOuterJoined(e)) return canEvalFullOuterJoinedConjunct(e, tupleIds); : if (isOjConjunct(e)) { : // Force this predicate to be evaluated by the corresponding outer join node. : // The join node will pick up the predicate later via getUnassignedOjConjuncts(). : if (tids.size() > 1) return false; : // Optimization for single-tid predicates: Legal to assign below the outer join : // if the predicate is from the same On-clause that makes tid nullable : // (otherwise e needn't be true when that tuple is set). : TupleId tid = tids.get(0); : return globalState_.ojClauseByConjunct.get(e.getId()) == getLastOjClause(tid); : } : : // Should have returned in one of the cases above. : Preconditions.checkState(false); The new overload of canEvalPredicate() is not quite clear in my opinion. We can encapsulate these into a function 'canEvalOnClauseConjunct(tupleIds, e)' and use it directly in SingleNodePlanner#migrateConjunctsToInlineView(). http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@184 PS4, Line 184: Expr.removeDuplicates(conjuncts_); Can we move this into MultiAggregateInfo#collectConjuncts()? http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1191 PS4, Line 1191: if (analyzer.canEvalPredicate(tupleIds, e, false)) { : if (e.isOnClauseConjunct() || : analyzer.isLastOjMaterializedByTupleIds(tupleIds, e)) { : preds.add(e); : } else { : // Deal predicate belong to an outer-joined tuple : try { : if (analyzer.isTrueWithNullSlots(e)) continue; : } catch (InternalException ex) { : // Expr evaluation failed in the backend. Skip 'e' since we cannot : // determine whether propagation is safe. : LOG.warn("Skipping propagation of conjunct because backend evaluation " : + "failed: " + e.toSql(), ex); : continue; : } : evalAfterJoinPreds.add(e); : } Althouth the first if-clause is semantically equal to the old logics, it's not quite clear since e.isOnClauseConjunct() and analyzer.isLastOjMaterializedByTupleIds(tupleIds, e) which are part of canEvalPredicate(). I recommend refactoring it to not invoke canEvalPredicate() like this (with my comment in Analyzer.java): if (e.isOnClauseConjunct()) { if (analyzer.canEvalOnClauseConjunct(tupleIds, e)) preds.add(e); } else if (analyzer.isLastOjMaterializedByTupleIds(tupleIds, e)) { preds.add(e); } else { // Deal with other predicates bounded to an outer-joined tuple try { if (analyzer.isTrueWithNullSlots(e)) continue; } catch (InternalException ex) { // Expr evaluation failed in the backend. Skip 'e' since we cannot // determine whether propagation is safe. LOG.warn("Skipping propagation of conjunct because backend evaluation " + "failed: " + e.toSql(), ex); continue; } evalAfterJoinPreds.add(e); } Also add some trace logging for better debugging. What do you think? http://gerrit.cloudera.org:8080/#/c/15047/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1219 PS4, Line 1219: // Migrate the conjucts evaluate the nullable side of outer-join. They should also : // evaluate after join, not mark as assigned. "Propagate the conjuncts evaluating the nullable side of outer-join. Don't mark them as assigned so they would be assigned at the JOIN node." http://gerrit.cloudera.org:8080/#/c/15047/4/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test: http://gerrit.cloudera.org:8080/#/c/15047/4/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1002 PS4, Line 1002: 730 This is actually 73 after adding the predicate. Note that the test passes since it doesn't verify the cardinality (run without PlannerTestOption.VALIDATE_CARDINALITY). But it's good to correct the value as well. -- To view, visit http://gerrit.cloudera.org:8080/15047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6c23a45aeb5dd1aa06a95c9aa8628ecbe37ef2c1 Gerrit-Change-Number: 15047 Gerrit-PatchSet: 4 Gerrit-Owner: Xianqing He <hexianqing...@126.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Xianqing He <hexianqing...@126.com> Gerrit-Comment-Date: Mon, 20 Jan 2020 14:46:53 +0000 Gerrit-HasComments: Yes