lipeng...@sensorsdata.cn has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18894 )
Change subject: IMPALA-11507: Use absolute_path when Iceberg data files are outside of the table location ...................................................................... IMPALA-11507: Use absolute_path when Iceberg data files are outside of the table location For Iceberg tables, when one of the following properties is used, it is considered that the table is possible to have data outside the table location directory: - 'write.object-storage.enabled' is true - 'write.data.path' is not empty - 'write.location-provider.impl' is configured - 'write.object-storage.path'(Deprecated) is not empty - 'write.folder-storage.path'(Deprecated) is not empty We should tolerate the situation that relative path of the data files cannot be obtained by the table location path, and we could use the absolute path in that case. E.g. the ETL program will write the table that the metadata of the Iceberg tables is placed in 'hdfs://nameservice_meta/warehouse/hadoop_catalog/ice_tbl/metadata', the recent data files in 'hdfs://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', and the data files half a year ago in 's3a://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', it should still be queried normally by Impala. Testing: - added e2e tests Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be --- M be/src/exec/hdfs-scan-node-base.cc M be/src/scheduling/scheduler.cc M common/fbs/CatalogObjects.fbs M common/protobuf/planner.proto M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/planner/ExplainTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/42056022-e2d2-4548-9376-8993109c2ace-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/b5880d95-f4f1-49cb-ba55-143c221017fe-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/ce7ad1c8-1ad5-4391-a640-b203d7c476a4-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4264681048229339305-1-b5880d95-f4f1-49cb-ba55-143c221017fe.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4265463682522664668-1-ce7ad1c8-1ad5-4391-a640-b203d7c476a4.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-7684033746298894981-1-42056022-e2d2-4548-9376-8993109c2ace.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=0/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=1/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=1/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=2/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=0/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00001.parquet M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/iceberg-multiple-storage-locations-table.test M tests/query_test/test_iceberg.py 38 files changed, 1,239 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/18894/8 -- To view, visit http://gerrit.cloudera.org:8080/18894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be Gerrit-Change-Number: 18894 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jian Zhang <zjsar...@gmail.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>