Hello Tamas Mate, lipeng...@sensorsdata.cn, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/19043 to look at the new patch set (#6). Change subject: IMPALA-11591: Avoid calling planFiles() on Iceberg tables ...................................................................... IMPALA-11591: Avoid calling planFiles() on Iceberg tables Iceberg's planFiles() API is very expensive as it needs to read all the relevant manifest files. It's especially expensive on object stores like S3. When there are no predicates on the table and we are not doing time travel it's possible to avoid calling planFiles() and do the scan planning from cached metadata. When none of the predicates are on partition columns there's little benefit of pushing down predicates to Iceberg. So with this patch we only push down predicates (and hence invoke planFiles()) when at least one of the predicates are on partition columns. This patch introduces a new class to store content files: IcebergContentFileStore. It separates data, delete, and "old" content files. "Old" content files are the ones that are not part of the current snapshot. We add such data files during time travel. Storing "old" content files in a separate concurrent hash map also fixes a concurrency bug in the current code. Testing: * executed current e2e tests * updated predicate push down tests Change-Id: Iadb883a28602bb68cf4f61e57cdd691605045ac5 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-multiple-storage-locations-table.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-runtime-filter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test M tests/query_test/test_runtime_filters.py 18 files changed, 492 insertions(+), 223 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19043/6 -- To view, visit http://gerrit.cloudera.org:8080/19043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iadb883a28602bb68cf4f61e57cdd691605045ac5 Gerrit-Change-Number: 19043 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>