Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20866 )
Change subject: IMPALA-12412: Support partition evolution in OPTIMIZE statement ...................................................................... IMPALA-12412: Support partition evolution in OPTIMIZE statement The OPTIMIZE statement is used to execute table maintenance tasks on Iceberg tables, such as: 1. compacting small files, 2. merging delete deltas, 3. rewriting the table according to the latest schema and partition spec. OptimizeStmt used to serve as an alias for INSERT OVERWRITE. After this change it works as follows: It creates a source statement that contains all columns of the table. All table content will be rewritten to new data files. After the executors finished writing, the Catalog calls RewriteFiles Iceberg API to commit the changes. All previous data and delete files will be excluded from, and all newly written data files will be added to the next snapshot. The old files remain accessible via time travel to older snapshots of the table. By default, Impala has as many file writers as query fragment instances and therefore can write too many files for unpartitioned tables. For smaller tables this can be limited by setting the MAX_FS_WRITERS Query Option. Authorization: OPTIMIZE TABLE requires ALL privileges. Limitations: All limitations about writing Iceberg tables apply. Testing: - E2E tests: - schema evolution - partition evolution - UPDATE/DELETE - time travel - table history - negative tests - Ranger tests for authorization - FE: Planner test: - sorting order - MAX_FS_WRITERS - partitioned exchange - Parser test Change-Id: I65a0c8529d274afff38ccd582f1b8a857716b1b5 Reviewed-on: http://gerrit.cloudera.org:8080/20866 Reviewed-by: Daniel Becker <daniel.bec...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/service/client-request-state.cc M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-optimize.test M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/query_test/test_iceberg.py 20 files changed, 643 insertions(+), 266 deletions(-) Approvals: Daniel Becker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/20866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I65a0c8529d274afff38ccd582f1b8a857716b1b5 Gerrit-Change-Number: 20866 Gerrit-PatchSet: 12 Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>