Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )
Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables ...................................................................... Patch Set 2: (19 comments) Thanks everyone for the comments! http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11 PS1, Line 11: and the HDFS : block locations were not consistent across the reload > +1. When we retrieve the file status from HDFS we get block location information as well. I.e. for each block we get a list about where can we find the replicas. The order of the elements in this list is random for each file status call. The order of the block locations matter during scheduling. In other words, even if the state of HDFS blocks remain the same, Impala might schedule scan ranges differently because it will see the block locations in different order. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25 PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't : handle self-events of Iceberg tables. > Can you add more information about this? For example what events are genera I extended the following sentence. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31 PS1, Line 31: via the event notification mechanism. I fixed this by cr > +1 if we have native version number support in Iceberg it is better to use Iceberg tables can be stored in different Iceberg Catalogs. One of it is HiveCatalog which uses HMS to store information about Iceberg tables. Most importantly it stores snapshot information in table property 'metadata_location'. We could check if it has changed. But if the Iceberg table is stored in HadoopTables/HadoopCatalog, then we would need to do expensive IO operations to determine the snapshot ID. I think we could use the snapshot ID in IcebergTable.load() to avoid reloading a lot of information. But we still need to detect self-events because the catalogVersion of the table gets updated and it can cause exceptions in local catalog mode (InconsistentMetadataFetchException with message "Catalog object %s changed version between accesses."). I'm looking forward to the overhaul of self-events logic and how it will work with IcebergTables in HiveCatalog (where we don't have control of HMS RPCs). http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34 PS1, Line 34: Iceberg has to embed all table modifications in a single ALTER TABLE > It is a bit tricky, but self-event handling can be also tested, see https:/ Added e2e tests. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@348 PS1, Line 348: hdfsTable_ : .load(false, msClient, msTable_, true, true, false, null, null,null, reason); : pathHashToFileDescMap_ = Utils.loadAllPartition(this); > This is probably unrelated to this patch I was curious to understand what i Yeah, it's on the radar: IMPALA-10254 Currently metadata handling of Iceberg tables is quite ugly and redundant, especially file metadata. We load all the file descriptors recursively here via file listings. Then we also load the file descriptors one-by-one... I think as a first step we should at least reuse the file descriptors of the underlying hdfsTable_ because the recursive listing is probably more efficient than the file status RPCs one-by-one... But the ideal would be to refactor Iceberg tables, hopefully get rid of the underlying HdfsTable. And most importantly create file descriptors from the information in Iceberg metadata files. Though the latter might not achievable when HDFS is used to store data files because we need block location information. But on S3 we should only use Iceberg metadata. The only challenge there is to come up with a "modification time" of the files which is needed for the remote data cache (hopefully we can use the snapshot time if noone verifies it). http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@481 PS1, Line 481: TGetPartialCatalogObjectResponse resp = : getHdfsTable().getPartialInfo(req, missingPartialInfos); : if (req.table_info_selector.want_iceberg_snapshot) { : resp.table_info.setIceberg_snapshot( : FeIcebergTable.Utils.createTIcebergSnapshot(this)); : } > are we sending some file descriptors twice here? Some from the hdfsTable an Yes, we send all data file descriptors twice. Let me deal with it in a separate patch. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104 PS1, Line 104: properties.setProperty("external.table.purge", "TRUE"); > Is this change related to the ones mentioned in the commit message? Yes, because before this change we issued an ALTER TABLE statement to set this property. Which we processed later during event processing. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151 PS1, Line 151: properties.setProperty(IcebergTable.ICEBERG_CATALOG, : tableProps.get(IcebergTable.ICEBERG_CATALOG)); > Is this change related to the ones mentioned in the commit message? Yes, because in the past we set it in a separate ALTER TABLE. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@110 PS1, Line 110: "Failed to load table: %s.%s", msTable.getDbName(), msTable.getTableName()), e); > nit probably should include the snapshot id in the message, if it is not in We would also retrieve the snapshot id in loadIcebergSnapshot(). http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@116 PS1, Line 116: pathHashToFileDescMap_ > Am I correct in understanding that pathHashToFileDescMap_ is always loaded Yes, we shouldn't load file descriptors twice, but currently things break if we don't do that. Let me deal with it in a separate patch, probably in the context of IMPALA-10254. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1227 PS1, Line 1227: a > nit. an Done http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268 PS1, Line 1268: addSummary(response, "Updated table."); > Updated table after set table properties I used the same message that we use at L1079. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272 PS1, Line 1272: Updated table. > Updated table after unset table properties I used the same message that we use at L1090. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295 PS1, Line 1295: if (!needsToUpdateHms) { : loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " + : params.getAlter_type().name()); : catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); : addTableToCatalogUpdate(tbl, wantMinimalResult, response.result); : return false; : } > Should this block of code be executed within 1283-1286 too? That is, if txn SET PARTITION SPEC doesn't use an Iceberg transaction and we don't need to update HMS either. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2915 PS1, Line 2915: IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn, > It is not clear to me why do we need to increment the catalog version here dropTableStats() issues an ALTER TABLE statement, then the iceTxn.commitTransaction() will issue another ALTER TABLE statement, therefore we need to invoke catalog_.addVersionsForInflightEvents() twice (once at L2914 and once at L2773). It looked reasonable to me to use two different catalogVersions, but seems like using the same catalogVersion twice also works. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2918 PS1, Line 2918: iceTxn.commitTransaction(); : return newCatalogVersion; > Can you move these statements into a finally block and avoid code duplicati Actually we can remove these two lines, since after the if stmt we have the same two lines. I'm not sure about putting commitTransaction() to a finally block. I'd think we don't want to commit in case of exceptions. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6070 PS1, Line 6070: > Can we add test coverage for this in the test_self_events? Added a test to test_events_custom_configs.py http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@113 PS1, Line 113: addColumn > nit. addColumns()? Done http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@339 PS1, Line 339: table properties usi > nit. update property in 'txn' Updated the comment. -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 23 Sep 2021 13:34:40 +0000 Gerrit-HasComments: Yes