Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough ......................................................................
Patch Set 7: (40 comments) http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 145: DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc())); > This should be IsPrefixOf() because we sanity checking the row descriptors Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 115: // passthrough case, the child can be closed right away because the row batch received > the child can be closed right away -> the child was already closed? Done Line 116: // from the child is copied (more details below). > accuracy: the row batch wasn't copied Done Line 121: if (child_idx_ < children_.size() && isChildPassThrough(child_idx_)) { > Suggest comment: Done Line 122: // If the rows from the current child can be passed through, the physical row layout > This comment doesn't seem to add anything, I suggest removing it. Replaced this with your suggestion. Line 131: // It may be possible that the row batch is not empty, so we save the previous value. > More details would be helpful. If the batch has rows at this point, I'd ima Added a dcheck instead. Time made this suggestion in patch 4. Line 148: // 'needs_deep_copy' lets us safely close the child in the next GetNext() call. > style: 'needs_deep_copy' is not a visible variable here, suggest just sayin Done Line 154: > DCHECK that child_idx_ is not passthrough here Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 67: // Which children can be passed through, without being materialized. > ... without evaluating and materializing their exprs. Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 412: /// Return true if the physical layout of this descriptor matches the physical layout > matches that of other_desc Done Line 413: /// of other_desc, but not necessarily ids. > bot not necessarily the id. Done Line 565: /// of other_desc, but not necessarily ids. > but not necessarily the ids Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h File be/src/service/query-options.h: Line 38: TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\ > I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a lit We have both positive and negative like DISABLE_CODEGEN and ENABLE_EXPR_REWRITES. I agree that ENABLE is simpler and easier to think about. (We should rename all DISABLE options.) http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 227: public boolean isEquivalent(SlotDescriptor other) { > Unfortunately, the term 'equivalent' already has a different meaning in the Done http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 616: public List<Expr> getUnionNodeResultExprs() { > getUnionResultExprs() Done http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 47: * a child has an identical tuple layout as the output of the union node, the > ... as the output of the union node, and the child only has naked SlotRefs Done Line 57: protected final List<Expr> resultExprs_; > unionResultExprs_ to make distinguish it from the resultExprLists_ and such Done Line 73: // If false, no batches from child nodes would be passed through in the backend. > Comment should describe what this flag is. Also you mean "true" right? Done Line 76: // Indicates for which child nodes batches can be passed through in the backend. > remove "in the backend" (it's clear that execution happens there) Done Line 81: protected UnionNode(PlanNodeId id, TupleId tupleId) { > Is this c'tor still needed? If not, remove. Yes, it's used if we are creating a constant node. (with no children) Line 89: List<Expr> resultExprs, boolean isInSubplan) { > indentation, unionResultExprs Done Line 182: * Compute whether we can pass through rows without materializing for the given child. > Can combine/simplify like this: Done. I don't think it's necessary to describe the passthrough conditions here. The implementation makes it clear. Line 189: Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> childExprList) { > childExprList -> childResultExprs Done Line 190: if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false; > seems clearer to move this into init() and not execute any of the passthrou We need to construct a list of booleans to indicate if the child can be passed through. We would have to then construct a list of false in init() if passthrough is disabled. I think it's simpler if we keep it the way it is. Line 193: // TODO: Remove this as part of IMPALA-4179. > Move TODO to the class comment This TODO seems a little out of place in the class comment. Won't we have to give additional information there for this comment to make sense. Line 194: if (isInSubplan_) return false; > same here, seems easier to move this check into init() Same as above. It's weird to special case adding a false to the list. Line 205: // Verify that the union node has one slot for every expression. > union node -> union tuple descriptor Done Line 212: if (resultExprs_.size() != childTupleDescriptor.getSlots().size()) return false; > I don't think this tricky check is correct because it won't allow passthrou Created a JIRA for handling advanced passthrough cases. Line 218: SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false); > unionSlotRef, childSlotRef Done Line 220: if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) continue; > move above the unwrapSlotRef() calls Done Line 221: Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_)); > Don't think we need this check, but something like this would be good: Done Line 223: Preconditions.checkState(!(childSlot instanceof SlotRef)); > No need for this check Done Line 262: // Compute which nodes can be passed through. > which child nodes Done Line 266: // Check that if the child outputs a single tuple, then it's not nullable. Tuple > move into computePassThrough Done http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed through because it's not > because id is not-nullable (primary key) Done Line 346: select id from functional_kudu.alltypes > do select * With select *, passthrough doesn't get enabled. The layout of the union tuple is different that the layout of the child tuples. http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/union.test File testdata/workloads/functional-planner/queries/PlannerTest/union.test: Line 3104: # IMPALA-3678: Both union operands should produce rows with non-nullable slots which > remove "should" Done Line 3124: # IMPALA-3678: The Union operands that contain a join should not be passed through, > nice thanks! Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price) > lowercase count for consistency Done http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price) > lowercase count for consistency Done -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes