Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21871 )
Change subject: IMPALA-12861: Fix mixed file format listing for Iceberg tables ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs@23 PS2, Line 23: AVRO = 4 : } : > We can remove this comment. Done http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@195 PS2, Line 195: boolean hasAvro() { return has > You could write "!= 0" which is shorter and less error-prone. Done http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@252 PS2, Line 252: > Since this is not an 'else if', the 'if' should be placed to the following Done http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@524 PS2, Line 524: > Based on the method name it's weird that we populate file formats here. May Unfortunately, the count star optimization requires the fileFormats_ filled, and if I move up this method to the init(), then the sampling part is missing before gathering the file formats. It'd be better, if the checkSamplingAndCountStar method could be split in two, and this method should go between those, but I would not like to make this intrusive change. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278 PS2, Line 278: > There can be three SCAN nodes for a single Iceberg table (if the table has Reverted back to the original approach http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@867 PS2, Line 867: > You could write "!= 0" which is shorter and less error-prone. Reverted back to the original approach, I also made a microbenchmark for bitflags, BitSet, and the current solution: the bitflag solution is 10x faster than the current solution, but overall, the file format population takes like 100 ms for 2 million entries which is negligible in contrast to the Hadoop Path construction during file descriptor creation. -- To view, visit http://gerrit.cloudera.org:8080/21871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifae900914a0d255f5a4d9b8539361247dfeaad7b Gerrit-Change-Number: 21871 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 15 Oct 2024 09:30:35 +0000 Gerrit-HasComments: Yes
