Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19043 )
Change subject: IMPALA-11591: Avoid calling planFiles() on Iceberg tables ...................................................................... Patch Set 3: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@567 PS2, Line 567: table, new ArrayList<>(), /*timeTravelSpecl=*/null); : for (ContentFile contentFile : Iterables.concat(allFiles.first, allFiles.second)) { : addContentFileToFileStore(contentFile, fileStore, table, hdfsFileDescMap); : } : return fileStore; : } : : private static void addContentFileToFileStore(ContentFile contentFile, : IcebergContentFileStore fileStore, IcebergTable table, : Map<String, HdfsPartition.FileDescriptor> hdfsFileDescMap) throws IOException { : String pathHash = IcebergUtil.getFilePathHash(contentFile); : H > nit: The logic of these two for loops is similar, so would it be better if Refactored the code a bit. http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@44 PS2, Line 44: // List of Iceberg data files (doesn't include delete files) : private final List<FileDescriptor> dataFiles_ = new ArrayList< > nit: We can add a comment so that others can understand them quickly. Added comments. http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@48 PS2, Line 48: private final List<Fi > We already initialized the member variable at declaration moment and will n Added final declarations. http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@106 PS2, Line 106: etworkAddresses, > The Field Requiredness of 'TIcebergContentFileStore.path_hash_to_data_file Done http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@108 PS2, Line 108: et = new IcebergContentFileStore(); > ditto Done http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java: http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@119 PS2, Line 119: : @Override > This is OK, but I think 'throw new NotImplementedException("msg");' is much Done http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19043/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@471 PS2, Line 471: getDefaultPartitionSpec > Whether time-travel should be considered? If this gets the most recent 'Par Seems like we can't retrieve the default partition spec for a snapshot: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Snapshot.java We can only get the partition spec from the table object: https://github.com/apache/iceberg/blob/7c2fcccf4a033ce620a254fae4ddc92cdc10bf85/api/src/main/java/org/apache/iceberg/Table.java#L103 So at this point, to determine if a column is involved in partitioning, we can either: * use the current partition spec * use all partition specs of the table (https://github.com/apache/iceberg/blob/7c2fcccf4a033ce620a254fae4ddc92cdc10bf85/api/src/main/java/org/apache/iceberg/Table.java#L110) I think using the current partition spec makes sense, as typically users tend to include more columns in partitioning over time. Even if this method returns false negatives/positives we can only lose on performance, but the end results should be correct. Let me know if you think otherwise and we should use all partition specs. -- To view, visit http://gerrit.cloudera.org:8080/19043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb883a28602bb68cf4f61e57cdd691605045ac5 Gerrit-Change-Number: 19043 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 17 Oct 2022 16:36:02 +0000 Gerrit-HasComments: Yes