Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/23985


Change subject: IMPALA-11986: (part 1) Optimize partition key scans for Iceberg 
tables
......................................................................

IMPALA-11986: (part 1) Optimize partition key scans for Iceberg tables

This patch optimizes queries that only scan IDENTITY-partitioned
columns. The optimization only applies, if:
* All materialized aggregate expressions have distinct semantics
  (e.g. MIN, MAX, NDV). In other words, this optimization will work
  for COUNT(DISTINCT c) but not COUNT(c).
* All materialized columns are IDENTITY-partitioned in all partition
  specs (this can be relaxed later)

If the above conditions are met, then each data file (without deletes)
only produce a single record. The rest of the table (data files with
deletes and delete files) are scanned normally.

Testing:
* added e2e tests

Change-Id: I32f78ee60ac4a410e91cf0e858199dd39d2e9afe
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-key-scans.test
M tests/query_test/test_iceberg.py
5 files changed, 206 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/23985/1
--
To view, visit http://gerrit.cloudera.org:8080/23985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32f78ee60ac4a410e91cf0e858199dd39d2e9afe
Gerrit-Change-Number: 23985
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>

Reply via email to