Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23042 )

Change subject: IMPALA-12337: Implement delete orphan files for Iceberg table
......................................................................


Patch Set 6:

(5 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java:

http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java@38
PS6, Line 38: as NOW()
When we document this feature we should make it very clear not to use NOW() as 
parameter as it will remove files of in-progress operations.

Or, it would be nice to have a limit on the timestamp, e.g. only allow 
timestamps earlier than "NOW() - interval 1 hours". But that would require the 
introduction of another tunable, at least for testing, e.g. 
"min_orphan_file_age_minutes".


http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java@79
PS6, Line 79:     if (timestampExpr != null) {
nit: to limit nesting, we could just:

 if (timestampExpr == null) {
   throw ...
 }

 try {
   olderThanMillis_ =
   ...


http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/catalog/iceberg/ImpalaIcebergDeleteOrphanFiles.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/ImpalaIcebergDeleteOrphanFiles.java:

http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/catalog/iceberg/ImpalaIcebergDeleteOrphanFiles.java@61
PS6, Line 61: This is copied from HiveIcebergDeleteOrphanFiles.java with 
minimal changes
            :  * (HIVE-27906).
Please also include the git hash, so later we can quickly check for changes.


http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/23042/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1687
PS6, Line 1687:       iceTxn.commitTransaction();
I just wonder if we should return early after L1631, this way we might save an 
unnecessary table reload. Though we also won't update TBL_PROP_LAST_DDL_TIME, 
but it might be OK for this case.


http://gerrit.cloudera.org:8080/#/c/23042/6/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/23042/6/tests/query_test/test_iceberg.py@464
PS6, Line 464: @SkipIf.not_dfs
Can we make this test work on S3 as well? Since we are removing files I think 
it's critical to test it against as many storage systems as we can.



--
To view, visit http://gerrit.cloudera.org:8080/23042
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0
Gerrit-Change-Number: 23042
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 24 Jun 2025 16:12:26 +0000
Gerrit-HasComments: Yes

Reply via email to