Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/24041 )
Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables ...................................................................... Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@155 PS1, Line 155: } If the whole container is needed, a fast path could copy it without constructing the reverseMap. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@480 PS1, Line 480: // Always include metadata flags and partitions : ret.setHas_avro(hasAvro_); : ret.setHas_orc(hasOrc_); : ret.setHas_parquet(hasParquet_); : ret.setMissing_files(new ArrayList<>(missingFiles_)); : ret.setPartitions(convertPartitionMapToList(partitions_)); Isn't it enough to set these in first request? partitions_ can be pretty large. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@869 PS1, Line 869: long fileOffset = req.table_info_selector.isSetIceberg_file_offset() : ? req.table_info_selector.iceberg_file_offset : 0; : long fileLimit = BackendConfig.INSTANCE.getCatalogPartialFetchMaxFiles(); Not sure if cross compatibility between old coordinator and new catalogd is a goal here, but if yes, then we shouldn't enforce the fileLimit in case fileOffset is not set to avoid returning partial results. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@888 PS1, Line 888: warn I am not sure about the severity of this - with current default of 1M it makes sense, but someone may decrease it, leading to warning spew. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1237 PS1, Line 1237: // Keep requesting until we get 0 files back (reached the end) : // The catalogd enforces the file limit per response Can't we deduce the total number of files from req? It would be better to avoid the extra RPC when possible. Besides the perf cost this also increased the chance of InconsistentMetadataFetchException http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1277 PS1, Line 1277: countIcebergContentFiles I would prefer to move this and mergeIcebergContentFiles to IcebergContentFileStore as static files. The reason is that CatalogdMetaProvider.java is already very large and complicated. -- To view, visit http://gerrit.cloudera.org:8080/24041 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb Gerrit-Change-Number: 24041 Gerrit-PatchSet: 1 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 02 Mar 2026 18:16:43 +0000 Gerrit-HasComments: Yes
