Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20434 )

Change subject: IMPALA-12408: Optimize HdfsScanNode.computeScanRangeLocations()
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20434/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20434/3//COMMIT_MSG@30
PS3, Line 30: changes how we count the number of partitions
            : per file system
Does it only change the "implementation" of how we count them or does it also 
change the result? It could be made clearer.


http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@550
PS3, Line 550:         if (sampleFileIdxs == null) {
Could use 'computeIfAbsent()'.


http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a1155
PS3, Line 1155:
Can 'computeScanRangeLocations()' be called multiple times on the same 
'HdfsScanNode' object? If it can, then before this change 'numPartitionsPerFs_' 
was reset here in each call, but in the new version we only add new values or 
increase existing ones.

If 'computeScanRangeLocations()' can only be called once, we could mention it 
in the function doc comment.


http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1341
PS3, Line 1341:    * the scan ranges can be (may be ignored if the file is not 
splittable).
Could mention 'partitionLocation', for example that it must correspond to 
'partition'.


http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1370
PS3, Line 1370:    * whether the file has a missing disk id and the maximum 
scan range (in bytes) found.
Could mention 'partitionLocation' and 'fsType', for example that they must 
correspond to 'partition'.



--
To view, visit http://gerrit.cloudera.org:8080/20434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf3e9c169d65c15df6a6762cc68fbb477fe64a7c
Gerrit-Change-Number: 20434
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 Aug 2023 13:16:06 +0000
Gerrit-HasComments: Yes

Reply via email to