Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20434 )
Change subject: IMPALA-12408: Optimize HdfsScanNode.computeScanRangeLocations() ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/20434/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20434/5//COMMIT_MSG@21 PS5, Line 21: in Impala nit: during Impala planning, http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@333 PS5, Line 333: Map<Long, List<FileDescriptor>> sampledFiles_ = null; Please document what is the Long key in this map represent. Looks like it is a partition ID? http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1150 PS5, Line 1150: for (FeFsPartition partition: partitions_) { General question: is it worth or even possible to parallelize this loop? Maybe using Java's parallel stream? http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1153 PS5, Line 1153: String partitionLocation = partition.getLocation(); : Path partitionPath = new Path(partitionLocation); > question: is it make sense to pass down the partitionPath instead of the ra Ok, so consistent hashCode from Java's String.hashCode() turns out to be important. Maybe a good idea to point that out as comment. -- To view, visit http://gerrit.cloudera.org:8080/20434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf3e9c169d65c15df6a6762cc68fbb477fe64a7c Gerrit-Change-Number: 20434 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 30 Aug 2023 17:02:26 +0000 Gerrit-HasComments: Yes