lipeng...@sensorsdata.cn has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/18894 )

Change subject: IMPALA-11507: Use absolute_path when Iceberg data files are 
outside of the table location
......................................................................

IMPALA-11507: Use absolute_path when Iceberg data files are outside of the 
table location

For Iceberg tables, when one of the following properties is used, it is
considered that the table is possible to have data outside the table
location directory:
- 'write.object-storage.enabled' is true
- 'write.data.path' is not empty
- 'write.location-provider.impl' is configured
- 'write.object-storage.path'(Deprecated) is not empty
- 'write.folder-storage.path'(Deprecated) is not empty

We should tolerate the situation that relative path of the data files
cannot be obtained by the table location path, and we could use the
absolute path in that case. E.g. the ETL program will write the table
that the metadata of the Iceberg tables is placed in
'hdfs://nameservice_meta/warehouse/hadoop_catalog/ice_tbl/metadata',
the recent data files in
'hdfs://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', and the
data files half a year ago in
's3a://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', it
should still be queried normally by Impala.

Testing:
 - added e2e tests

Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/protobuf/planner.proto
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/42056022-e2d2-4548-9376-8993109c2ace-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/b5880d95-f4f1-49cb-ba55-143c221017fe-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/ce7ad1c8-1ad5-4391-a640-b203d7c476a4-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4264681048229339305-1-b5880d95-f4f1-49cb-ba55-143c221017fe.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4265463682522664668-1-ce7ad1c8-1ad5-4391-a640-b203d7c476a4.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-7684033746298894981-1-42056022-e2d2-4548-9376-8993109c2ace.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=0/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=1/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=1/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=2/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=0/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00001.parquet
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-multiple-storage-locations-table.test
M tests/query_test/test_iceberg.py
38 files changed, 1,239 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/18894/8
--
To view, visit http://gerrit.cloudera.org:8080/18894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
Gerrit-Change-Number: 18894
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zjsar...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to