aokolnychyi commented on a change in pull request #3056:
URL: https://github.com/apache/iceberg/pull/3056#discussion_r809664272
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -244,10 +248,25 @@ public SparkTable alterTable(Identifier ident,
TableChange... changes) throws No
@Override
public boolean dropTable(Identifier ident) {
+ return dropTableInternal(ident, false);
+ }
+
+ @Override
+ public boolean purgeTable(Identifier ident) {
+ return dropTableInternal(ident, true);
+ }
+
+ private boolean dropTableInternal(Identifier ident, boolean purge) {
Review comment:
I think we should use `DeleteReachableFiles` that was added recently for
this use case. I doubt `CatalogUtil` can perform well on a reasonably sized
table. The action will scale much better as it uses distributed metadata tables
under the hood.
Here is what we use internally.
```
@Override
public boolean dropTable(Identifier ident) {
return dropTableWithoutPurging(ident);
}
@Override
public boolean purgeTable(Identifier ident) {
try {
Table table = load(ident).first();
ValidationException.check(
PropertyUtil.propertyAsBoolean(table.properties(), GC_ENABLED,
GC_ENABLED_DEFAULT),
"Cannot purge table: GC is disabled (deleting files may corrupt
other tables)");
String metadataFileLocation = ((HasTableOperations)
table).operations().current().metadataFileLocation();
boolean dropped = dropTableWithoutPurging(ident);
if (dropped) {
SparkActions actions = SparkActions.get();
actions.deleteReachableFiles(metadataFileLocation)
.io(table.io())
.execute();
}
return dropped;
} catch (org.apache.iceberg.exceptions.NoSuchTableException e) {
return false;
}
}
private boolean dropTableWithoutPurging(Identifier ident) {
if (isPathIdentifier(ident)) {
return tables.dropTable(((PathIdentifier) ident).location(), false);
} else {
return icebergCatalog.dropTable(buildIdentifier(ident), false);
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]