Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )
Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables ...................................................................... Patch Set 1: (6 comments) Some high level comments. The code looks good to me so far, but I will go through it once more. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11 PS1, Line 11: and the HDFS : block locations were not consistent across the reload This is not clear to me: if the block locations have changed between queries, then it makes sense to schedule them differently, but if they didn't change (which seems the more common case to me), then how are they different? http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25 PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't : handle self-events of Iceberg tables. Can you add more information about this? For example what events are generated in HMS for different Iceberg operations? I guess that schema changes will be treated similarly to the case when Impala calls HMS directly, but I can't imagine what happens in case of INSERTs at the moment. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31 PS1, Line 31: service ID and new catalog version for the Iceberg table Doesn't Iceberg have something similar to catalog version, e.g. something like "snapshot id", that could be used to check whether we need to reload data? The current way we handle self-events is a horrible hack IMO which is needed because HMS doesn't provide a usable version number for table/partition metadata. Just hoping that Iceberg does have something like that :) http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34 PS1, Line 34: * added e2e test for the scan range assignment It is a bit tricky, but self-event handling can be also tested, see https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L508 http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104 PS1, Line 104: properties.setProperty("external.table.purge", "TRUE"); Is this change related to the ones mentioned in the commit message? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151 PS1, Line 151: properties.setProperty(IcebergTable.ICEBERG_CATALOG, : tableProps.get(IcebergTable.ICEBERG_CATALOG)); Is this change related to the ones mentioned in the commit message? -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Mon, 20 Sep 2021 15:40:32 +0000 Gerrit-HasComments: Yes