Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17857
Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables ...................................................................... IMPALA-10914: Consistently schedule scan ranges for Iceberg tables Before this patch Impala inconsistently scheduled scan ranges for Iceberg tables on HDFS, in local catalog mode. It did so because LocalIcebergTable reloaded all the files descriptors, and the HDFS block locations were not consistent across the reloads. Impala's scheduler uses the block location list for scan range assignment, hence the assignments were inconsistent between queries. This has a negative effect on caching and hence hit performance quite badly. It is redundant and expensive to reload file descriptors for each query in local catalog mode. This patch extends the GetPartialInfo() RPC with Iceberg-specific snapshot information. It means that the coordinator is now able to fetch Iceberg data file descriptors from the CatalogD. This way scan range assignment becomes consistent because we reuse the same file descriptors with the same block location information. Fixing the above revealed another bug. Before this patch we didn't handle self-events of Iceberg tables. When an Iceberg table is stored in the HiveCatalog it means that Iceberg will update the HMS table on modifications. Then Catalogd processes these modifications again when they were arrive via the event notification mechanism. I fixed this by creating Iceberg transactions in which I set the catalog service ID and new catalog version for the Iceberg table. Testing: * added e2e test for the scan range assignment Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 --- M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test M tests/query_test/test_iceberg.py 16 files changed, 290 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/1 -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>