Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20753

to look at the new patch set (#7).

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
......................................................................

IMPALA-12597: Basic Equality delete read support for Iceberg tables

In general, applying equality deletes is similar to how position
deletes are applied to data files: using a LEFT ANTI JOIN where the
SCAN for the data rows is on the left side while the SCAN for the
delete rows is on the right side of the JOIN. The difference is the
virtual columns and the conjuncts being used.
For equality deletes the data sequence number of a delete file has to
be greater than the data sequence number of the data file being
investigated. This information is added as a virtual column to the
scans and a conjunct is created in the JOIN node to check the relation.
The equality delete fields from the delete files are checked agains the
respective columns of the data SCANS.

This patch makes it possible for Impala to read Iceberg tables with
basic equality delete files. The Iceberg spec gives great flexibility
for engines for writing equality deletes, however in practice Flink,
one of the engines that write EQ-deletes supports only a subset of the
use cases. This patch focuses on reading the EQ-deletes written by
Flink.

The restrictions are the following:
- All equality delete files in a table should have the same equality
  field ID list.
- For partitioned Iceberg tables it is expected that the partition
  values are also written into the equality delete files.
- Tables with equality deletes shouldn't have partition or schema
  evolution.
- Floating point equality columns aren't supported.
- If a malformed equality delete file doesn't have some of the equality
  field IDs then Parquet reader will fill those missing fields with
  NULLs. As a side effect this will drop the rows from the result where
  the corresponding data columns has a null value.
See IMPALA-11388 epic Jira for more details.

Testing:
- Checked if the existing functional_parquet.iceberg_v2_delete_equality
  table can be read successfully.
- Added new test table so that E2E tests can validate correctness.

Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
---
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.h
M common/thrift/CatalogObjects.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e00000001_800513971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-911559291487642581-1-bb4b8c07-84e1-421a-bb6c-594f297d118e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/tmp.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f00000000_1372039299_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e035009600000001_138281890_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a00000000_909175610_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/a94b351bfa56dbd8-ddb31c6400000000_1397530881_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/delete-494fbbd4b792bdd2-aabb8e2000000000_1239775374_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/e6484a6fb0a2b4e6-d242a84000000000_1980761347_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/25ce5480-23b6-4c70-a724-63931f8d84c6-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/87d3b6df-f00d-40a4-aafa-5d7f20e3299b-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/c8b17188-94bf-4496-9069-3eda900cd71d-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-4301391241829251636-1-25ce5480-23b6-4c70-a724-63931f8d84c6.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-4346796256488077976-1-87d3b6df-f00d-40a4-aafa-5d7f20e3299b.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-9091814429631192676-1-c8b17188-94bf-4496-9069-3eda900cd71d.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-bdb0c103-bfaf-49b4-935a-16a951e55b1c-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-bdb0c103-bfaf-49b4-935a-16a951e55b1c-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-fe90d7bb-47e7-4221-8e06-f63922d3fa23-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-fe90d7bb-47e7-4221-8e06-f63922d3fa23-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/807d7d2a-0557-4bbe-9f07-9467de72598a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/807d7d2a-0557-4bbe-9f07-9467de72598a-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/c9c459ff-9747-4dab-b65f-cace0f31e669-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/c9c459ff-9747-4dab-b65f-cace0f31e669-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/snap-3409033829588781878-1-c9c459ff-9747-4dab-b65f-cace0f31e669.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/snap-586812101365618837-1-807d7d2a-0557-4bbe-9f07-9467de72598a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-0c3800d3-c638-4591-b40c-158dcd5ebe25-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-0c3800d3-c638-4591-b40c-158dcd5ebe25-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-c6e2da66-fe58-44b5-81bd-575da62c7a91-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-c6e2da66-fe58-44b5-81bd-575da62c7a91-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-759289e0-d713-41a1-bdaf-f9feab643720-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-759289e0-d713-41a1-bdaf-f9feab643720-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-e1567ae8-d9c3-4071-b671-8bbbe79d36d1-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-e1567ae8-d9c3-4071-b671-8bbbe79d36d1-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/09ca0acc-c19e-4073-80f7-b476a6e568c7-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/09ca0acc-c19e-4073-80f7-b476a6e568c7-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/57f8cd32-619c-46fc-a683-1dee7473c990-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/57f8cd32-619c-46fc-a683-1dee7473c990-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/db5041df-259d-48b9-ade1-1bf382a93d5a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/db5041df-259d-48b9-ade1-1bf382a93d5a-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/e3dac70e-a8aa-4d15-9d35-20c4f25f36d5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/e3dac70e-a8aa-4d15-9d35-20c4f25f36d5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-3217166167862484560-1-db5041df-259d-48b9-ade1-1bf382a93d5a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-3979814664665937114-1-09ca0acc-c19e-4073-80f7-b476a6e568c7.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-4821964189199835313-1-e3dac70e-a8aa-4d15-9d35-20c4f25f36d5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-7564375228633944060-1-57f8cd32-619c-46fc-a683-1dee7473c990.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
M tests/query_test/test_iceberg.py
108 files changed, 3,226 insertions(+), 251 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/20753/7
--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to