Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16825

to look at the new patch set (#4).

Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables
......................................................................

IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables

This patch adds support to INSERT INTO identity-partitioned
Iceberg tables.

Identity-partitioned Iceberg tables are similar to regular
partitioned tables, they are even stored in the same directory
structure. The difference is that the data files still store
the partitioning columns.

Partitioned Iceberg tables are stored as non-partitioned tables
in the Hive Metastore (similarly to partitioned Kudu tables). However,
the InsertStmt still generates the partition expressions for them.
These partition expressions are used to shuffle and sort the input
data so we don't end up writing too many files. The HdfsTableSink
also uses the partition expressions to write the data files with
the proper partition paths.

Iceberg is able to parse the partition paths to generate the
corresponding metadata for the partitions. This happens at the
end in IcebergCatalogOpExecutor.

Testing:
 * added planner test to verify shuffling and sorting
 * added negative tests for unsupported features like PARTITION clause
   and non-identity partition transforms
 * e2e tests with partitioned inserts

TODO:
 * Current change includes IMPALA-10384, maybe we should push it
   separately

Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/dml-exec-state.cc
M be/src/service/client-request-state.cc
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogService.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M tests/query_test/test_iceberg.py
24 files changed, 405 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/4
--
To view, visit http://gerrit.cloudera.org:8080/16825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4
Gerrit-Change-Number: 16825
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: wangsheng <sky...@163.com>

Reply via email to