Riza Suminto has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/23042 )
Change subject: IMPALA-12337: Implement delete orphan files for Iceberg table ...................................................................... IMPALA-12337: Implement delete orphan files for Iceberg table This patch implements delete orphan files query for Iceberg table. The following statement becomes available for Iceberg tables: - ALTER TABLE <tbl> EXECUTE remove_orphan_files(<timestamp>) The bulk of implementation copies Hive's implementation of org.apache.iceberg.actions.DeleteOrphanFiles interface (HIVE-27906, 6b2e21a93ef3c1776b689a7953fc59dbf52e4be4), which this patch rename to ImpalaIcebergDeleteOrphanFiles.java. Upon execute(), ImpalaIcebergDeleteOrphanFiles class instance will gather all URI of valid data files and Iceberg metadata files using Iceberg API. These valid URIs then will be compared to recursive file listing obtained through Hadoop FileSystem API under table's 'data' and 'metadata' directory accordingly. Any unmatched URI from FileSystem API listing that has modification time less than 'olderThanTimestamp' parameter will then be removed via Iceberg FileIO API of given Iceberg table. Note that this is a destructive query that will wipe out any files within Iceberg table's 'data' and 'metadata' directory that is not addressable by any valid snapshots. The execution happens in CatalogD via IcebergCatalogOpExecutor.alterTableExecuteRemoveOrphanFiles(). CatalogD supplied CatalogOpExecutor.icebergExecutorService_ as executor service to execute the Iceberg API planFiles and FileIO API for deletion. Also fixed toSql() implementation for all ALTER TABLE EXECUTE queries. Testing: - Add FE and EE tests. Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0 Reviewed-on: http://gerrit.cloudera.org:8080/23042 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Zoltan Borok-Nagy <[email protected]> --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteExpireSnapshotsStmt.java A fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRollbackStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteStmt.java A fe/src/main/java/org/apache/impala/catalog/iceberg/ImpalaIcebergDeleteOrphanFiles.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M tests/query_test/test_iceberg.py 11 files changed, 591 insertions(+), 24 deletions(-) Approvals: Impala Public Jenkins: Verified Zoltan Borok-Nagy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/23042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0 Gerrit-Change-Number: 23042 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
