Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
......................................................................


Patch Set 4:

(12 comments)

Looks pretty good, mostly comments on phrasing.

http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@179
PS4, Line 179:   // A collection type is "required" if it is checked to be not 
empty. Collection
Shrink? The following seems sufficient.

// Tuple descriptors of collection slots that have an IsNotEmptyPredicate. See 
SelectStmt#registerIsNotEmptyPredicates().

Do we really need a new term "required". The meaning of that term is not clear 
clear to me from this comment. I think my suggested comment above is pretty 
clear. A collection slot is required if an only if it has an 
IsNotEmptyPredicate.


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@184
PS4, Line 184:   // Analysis adds IsNotEmptyPredicates where needed to enforce 
required collections.
I think this could be misunderstood. IsNotEmptyPredicates are not required, 
they are purely a perf optimization, they don't really "enforce" anything in 
the query semantics sense.

Suggest shrinking text as indicated above.


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@436
PS4, Line 436:   // Checks if slot refers to an array "pos" pseudo-column.
nit: we typically use /** */ style class and function comments


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@440
PS4, Line 440:   private boolean isArrayPosReference(SlotRef slot) {
Move to SlotRef. I generally prefer 'slotRef' instead of 'slot' because the 
latter could be misunderstood as a SlotDescriptor.


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@442
PS4, Line 442:     if (parent != null) {
reverse? if (parent == null) return false;


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@533
PS4, Line 533:    * Adds nested collections that are required from 'conjuncts' 
to requiredCollections_.
Populates 'requiredCollections_' based on IsNotEmptyPredicates in the given 
conjuncts.

(don't think we need to say more)


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@537
PS4, Line 537:    * Example: table T has field A which is of type 
array<array<int>>.
I don't think examples really belong here because this is not where we are 
adding the predicates. Instead they should be in 
SelecmStmt.registerIsNotEmptyPredicates().


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS4, Line 548:         
Preconditions.checkArgument(ref.getDesc().getType().isComplexType());
checkState


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@549
PS4, Line 549:         
Preconditions.checkArgument(ref.getDesc().getItemTupleDesc() != null);
checkState


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@574
PS4, Line 574:       // Note: it is conservative (but correct) to depend on a 
collection to be required.
I think we should state what we mean more directly. We rely on analysis to 
generate IsNotEmptyPredicates where correct, but analysis is conservative and 
misses some opportunities.


http://gerrit.cloudera.org:8080/#/c/8480/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@586
PS4, Line 586:    * collection-typed slot. As conjuncts are seen, mark nested 
collections that are
mark -> collect


http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/4/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@33
PS4, Line 33: aggregation(SUM, NumRowGroups): 2
we should also have planner tests to see how the explain looks



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Nov 2017 22:27:49 +0000
Gerrit-HasComments: Yes

Reply via email to