Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16825 to look at the new patch set (#13). Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only ...................................................................... IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only This patch adds support to INSERT INTO identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to regular partitioned tables, they are even stored in the same directory structure. The difference is that the data files still store the partitioning columns. The INSERT INTO syntax is similar to the syntax for non-partitioned tables, i.e.: INSERT INTO <iceberg_tbl> VALUES (<val1>, <val2>, <val3>, ...); Or, INSERT INTO <iceberg_tbl> SELECT <val1>, <val2>, ... FROM <source_tbl> (please note that we don't use the PARTITION keyword) The values must be in column order corresponding to the table schema. Impala will automatically create/find the partitions based on the Iceberg partition spec. Partitioned Iceberg tables are stored as non-partitioned tables in the Hive Metastore (similarly to partitioned Kudu tables). However, the InsertStmt still generates the partition expressions for them. These partition expressions are used to shuffle and sort the input data so we don't end up writing too many files. The HdfsTableSink also uses the partition expressions to write the data files with the proper partition paths. Iceberg is able to parse the partition paths to generate the corresponding metadata for the partitions. This happens at the end in IcebergCatalogOpExecutor. Testing: * added planner test to verify shuffling and sorting * added negative tests for unsupported features like PARTITION clause and non-identity partition transforms * e2e tests with partitioned inserts Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/dml-exec-state.cc M be/src/service/client-request-state.cc M common/fbs/IcebergObjects.fbs M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M tests/query_test/test_iceberg.py 27 files changed, 588 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/13 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com>