Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16825 to look at the new patch set (#5). Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables ...................................................................... IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables This patch adds support to INSERT INTO identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to regular partitioned tables, they are even stored in the same directory structure. The difference is that the data files still store the partitioning columns. Partitioned Iceberg tables are stored as non-partitioned tables in the Hive Metastore (similarly to partitioned Kudu tables). However, the InsertStmt still generates the partition expressions for them. These partition expressions are used to shuffle and sort the input data so we don't end up writing too many files. The HdfsTableSink also uses the partition expressions to write the data files with the proper partition paths. Iceberg is able to parse the partition paths to generate the corresponding metadata for the partitions. This happens at the end in IcebergCatalogOpExecutor. Testing: * added planner test to verify shuffling and sorting * added negative tests for unsupported features like PARTITION clause and non-identity partition transforms * e2e tests with partitioned inserts TODO: * Current change includes some parts of IMPALA-10384 which needs to be removed once https://gerrit.cloudera.org/#/c/16850/ is merged Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/dml-exec-state.cc M be/src/service/client-request-state.cc M common/fbs/IcebergObjects.fbs M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M tests/query_test/test_iceberg.py 25 files changed, 438 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/5 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com>