Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 4: (12 comments) http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@179 PS4, Line 179: // A collection type is "required" if it is checked to be not empty. Collection > Shrink? The following seems sufficient. went with just referring to not-empty instead of not-empty and required. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@184 PS4, Line 184: // Analysis adds IsNotEmptyPredicates where needed to enforce required collections. > I think this could be misunderstood. IsNotEmptyPredicates are not required, Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@436 PS4, Line 436: // Checks if slot refers to an array "pos" pseudo-column. > nit: we typically use /** */ style class and function comments whoops, Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@440 PS4, Line 440: private boolean isArrayPosReference(SlotRef slot) { > Move to SlotRef. I generally prefer 'slotRef' instead of 'slot' because the Done. sync'd with similar uses below as well. http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@442 PS4, Line 442: if (parent != null) { > reverse? if (parent == null) return false; Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@533 PS4, Line 533: * Adds nested collections that are required from 'conjuncts' to requiredCollections_. > Populates 'requiredCollections_' based on IsNotEmptyPredicates in the given Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@537 PS4, Line 537: * Example: table T has field A which is of type array<array<int>>. > I don't think examples really belong here because this is not where we are Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS4, Line 548: Preconditions.checkArgument(ref.getDesc().getType().isComplexType()); > checkState Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@549 PS4, Line 549: Preconditions.checkArgument(ref.getDesc().getItemTupleDesc() != null); > checkState Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@574 PS4, Line 574: // Note: it is conservative (but correct) to depend on a collection to be required. > I think we should state what we mean more directly. We rely on analysis to Done http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@586 PS4, Line 586: * collection-typed slot. As conjuncts are seen, mark nested collections that are > mark -> collect Done http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@33 PS4, Line 33: aggregation(SUM, NumRowGroups): 2 > we should also have planner tests to see how the explain looks Done -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Tue, 21 Nov 2017 00:33:26 +0000 Gerrit-HasComments: Yes