Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 4: (12 comments) Looks pretty good, mostly comments on phrasing. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@179 PS4, Line 179: // A collection type is "required" if it is checked to be not empty. Collection Shrink? The following seems sufficient. // Tuple descriptors of collection slots that have an IsNotEmptyPredicate. See SelectStmt#registerIsNotEmptyPredicates(). Do we really need a new term "required". The meaning of that term is not clear clear to me from this comment. I think my suggested comment above is pretty clear. A collection slot is required if an only if it has an IsNotEmptyPredicate. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@184 PS4, Line 184: // Analysis adds IsNotEmptyPredicates where needed to enforce required collections. I think this could be misunderstood. IsNotEmptyPredicates are not required, they are purely a perf optimization, they don't really "enforce" anything in the query semantics sense. Suggest shrinking text as indicated above. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@436 PS4, Line 436: // Checks if slot refers to an array "pos" pseudo-column. nit: we typically use /** */ style class and function comments http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@440 PS4, Line 440: private boolean isArrayPosReference(SlotRef slot) { Move to SlotRef. I generally prefer 'slotRef' instead of 'slot' because the latter could be misunderstood as a SlotDescriptor. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@442 PS4, Line 442: if (parent != null) { reverse? if (parent == null) return false; http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@533 PS4, Line 533: * Adds nested collections that are required from 'conjuncts' to requiredCollections_. Populates 'requiredCollections_' based on IsNotEmptyPredicates in the given conjuncts. (don't think we need to say more) http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@537 PS4, Line 537: * Example: table T has field A which is of type array<array<int>>. I don't think examples really belong here because this is not where we are adding the predicates. Instead they should be in SelecmStmt.registerIsNotEmptyPredicates(). http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS4, Line 548: Preconditions.checkArgument(ref.getDesc().getType().isComplexType()); checkState http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@549 PS4, Line 549: Preconditions.checkArgument(ref.getDesc().getItemTupleDesc() != null); checkState http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@574 PS4, Line 574: // Note: it is conservative (but correct) to depend on a collection to be required. I think we should state what we mean more directly. We rely on analysis to generate IsNotEmptyPredicates where correct, but analysis is conservative and misses some opportunities. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@586 PS4, Line 586: * collection-typed slot. As conjuncts are seen, mark nested collections that are mark -> collect http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@33 PS4, Line 33: aggregation(SUM, NumRowGroups): 2 we should also have planner tests to see how the explain looks -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 20 Nov 2017 22:27:49 +0000 Gerrit-HasComments: Yes