Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20434 )
Change subject: IMPALA-12408: Optimize HdfsScanNode.computeScanRangeLocations() ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/20434/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20434/3//COMMIT_MSG@30 PS3, Line 30: changes how we count the number of partitions : per file system Does it only change the "implementation" of how we count them or does it also change the result? It could be made clearer. http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@550 PS3, Line 550: if (sampleFileIdxs == null) { Could use 'computeIfAbsent()'. http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a1155 PS3, Line 1155: Can 'computeScanRangeLocations()' be called multiple times on the same 'HdfsScanNode' object? If it can, then before this change 'numPartitionsPerFs_' was reset here in each call, but in the new version we only add new values or increase existing ones. If 'computeScanRangeLocations()' can only be called once, we could mention it in the function doc comment. http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1341 PS3, Line 1341: * the scan ranges can be (may be ignored if the file is not splittable). Could mention 'partitionLocation', for example that it must correspond to 'partition'. http://gerrit.cloudera.org:8080/#/c/20434/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1370 PS3, Line 1370: * whether the file has a missing disk id and the maximum scan range (in bytes) found. Could mention 'partitionLocation' and 'fsType', for example that they must correspond to 'partition'. -- To view, visit http://gerrit.cloudera.org:8080/20434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf3e9c169d65c15df6a6762cc68fbb477fe64a7c Gerrit-Change-Number: 20434 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Comment-Date: Wed, 30 Aug 2023 13:16:06 +0000 Gerrit-HasComments: Yes