Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21871 )
Change subject: IMPALA-12861: Fix mixed file format listing for Iceberg tables ...................................................................... Patch Set 2: (6 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs@23 PS2, Line 23: // We add AVRO here as a future possibility. : // The Iceberg spec allows AVRO data files, but currently Impala : // cannot read such Iceberg tables. See IMPALA-11158. We can remove this comment. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@195 PS2, Line 195: == FbIcebergDataFileFormat.AVRO You could write "!= 0" which is shorter and less error-prone. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@252 PS2, Line 252: if Since this is not an 'else if', the 'if' should be placed to the following line. Same goes for L254. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@524 PS2, Line 524: populateFileFormats(); Based on the method name it's weird that we populate file formats here. Maybe we could just move this call to init(). Though in this case populateFileFormats() might need a precondition check that sampledPartitions_ is filled if sampling is involved. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278 PS2, Line 278: convertFileFormats() There can be three SCAN nodes for a single Iceberg table (if the table has EQ-delete files, there can be even more): UNION ALL / \ / \ / \ SCAN all ANTI JOIN datafiles / BUILD without / \ deletes SCAN SCAN datafiles position with deletes deletes This approach sets the same file formats for all which might not be true. If there are no perf problems with the old method (populating fileFormats_ in the c'tor of IcebergScanNode), then I think we should keep that. http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@867 PS2, Line 867: == FbIcebergDataFileFormat.AVRO You could write "!= 0" which is shorter and less error-prone. -- To view, visit http://gerrit.cloudera.org:8080/21871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifae900914a0d255f5a4d9b8539361247dfeaad7b Gerrit-Change-Number: 21871 Gerrit-PatchSet: 2 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 08 Oct 2024 16:37:11 +0000 Gerrit-HasComments: Yes
