Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Reviewed-on: http://gerrit.cloudera.org:8080/17857
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Qifan Chen <qc...@cloudera.com>
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
M tests/stress/test_insert_stress.py
21 files changed, 432 insertions(+), 140 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to