Zoltan Borok-Nagy has uploaded this change for review. (
http://gerrit.cloudera.org:8080/24042
Change subject: IMPALA-14592: Read Row Lineage of Iceberg tables
......................................................................
IMPALA-14592: Read Row Lineage of Iceberg tables
Iceberg V3 added mandatory row lineage tracking for Iceberg tables.
This means each field has a row-id and a last-updated-sequence-number
associated with it. These are either stored in the data files, or can
be calculated from file metadata the following way:
* row-id: _row_id field of the record. If missing or NULL, then it
is first-row-id of DataFile plus FILE__POSITION
* last-updated-sequence-number: _last_updated_sequence_number of the
record. If missing of NULL, then it is the data-sequence-number of
the DataFile.
To support Row Lineage in Impala, we introduce the concept of Hidden
Columns. Hidden Columns are columns of a table that can be stored in
the data files along with the data, but they don't participate in
'select *' expansion and they are non-modifiable. Some DBs refer to such
columns as "system columns". They are different from Virtual Columns
as Virtual Columns are not stored in the data files.
We introduce the following Hidden Columns:
* _file_row_id: BIGINT field with field id 2147483540.
* _file_last_updated_sequence_number: BIGINT field with field id
2147483539
We also introduce the following Virtual Column:
* ICEBERG__FIRST__ROW__ID: returns the first-row-id of the DataFile.
This is stored in the metadata, once for each data file, it is not
present in the data files.
Now we can calculate Iceberg V3 row-id and last-updated-sequence-number
the following way:
* row-id:
COALESCE(_file_row_id,
ICEBERG__FIRST__ROW__ID + FILE__POSITION)
* last-updated-sequence-number:
COALESCE(_file_last_updated_sequence_number,
ICEBERG__DATA__SEQUENCE__NUMBER)
Later we might add syntactic sugars for the above, for now this patch
set only makes it possible to calculate the values via the above
expressions.
Testing
* e2e tests added with Iceberg V3 tables written by Spark
Change-Id: I71b1076b25c9e7a0a6c9428b24abc986f5382c71
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAlterColStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableDropColStmt.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
M fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/IcebergMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-0-153001a8-dc43-4e8b-ad61-b691a1754e16-0-00001.parquet
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-1-9e4c5793-eb01-410d-a963-807e22437794-0-00001.parquet
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-1-e55b64a3-1aa3-4a3c-87a1-cd3d2988c499-0-00001.parquet
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-2-d67e29ee-b654-4420-a7a5-9d7964ffd9c9-0-00001.parquet
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m1.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m2.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m3.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7a6ede87-b2d9-462e-9baa-77e456f07671-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/8ea2cf61-8fe7-4599-923a-d64b424cae3f-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/e46e6fcd-0a4e-4001-a0db-e199a5eb4227-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/fe2e965b-4685-4369-babf-31d13f81f10a-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-2872597867664652808-1-fe2e965b-4685-4369-babf-31d13f81f10a.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-5398841822738664432-1-7411e291-ddc0-4c54-9e25-75ef7878df0d.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-7384452996480084466-1-e46e6fcd-0a4e-4001-a0db-e199a5eb4227.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-8059325670730066324-1-8ea2cf61-8fe7-4599-923a-d64b424cae3f.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v5.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/version-hint.text
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-0-e69cb204-0c90-4255-8b0b-7af3aec3f75d-0-00001.orc
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-1-0ac66c53-638d-4aaf-9084-8a24b7aa2cdf-0-00001.orc
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-2-f69a801d-ce1f-478e-98e4-f5321d122361-0-00001.orc
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-3-84703627-8eea-44f1-a09b-e5bdad596090-0-00001.orc
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/3159a0a5-681d-4ac9-bf72-4be5814546cf-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/4e12ed17-3e31-4d27-b35f-55467a2bf5fe-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m1.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m2.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m3.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/e5f99ba8-b804-434f-aa9e-d51e86cc0180-m0.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-1530771818348079345-1-3159a0a5-681d-4ac9-bf72-4be5814546cf.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7033898671372067760-1-4e12ed17-3e31-4d27-b35f-55467a2bf5fe.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7330590250419058232-1-8542d294-4d10-4efc-9e9d-69d3dce88108.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7480347588879981313-1-e5f99ba8-b804-434f-aa9e-d51e86cc0180.avro
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v5.metadata.json
A
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/version-hint.text
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-row-lineage.test
M tests/query_test/test_iceberg.py
70 files changed, 582 insertions(+), 33 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/24042/1
--
To view, visit http://gerrit.cloudera.org:8080/24042
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I71b1076b25c9e7a0a6c9428b24abc986f5382c71
Gerrit-Change-Number: 24042
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>