fafacao86 opened a new issue, #14980: URL: https://github.com/apache/iceberg/issues/14980
### Query engine _No response_ ### Question Hi community, We’re using S3 as our main storage and I’ve noticed that built-in Iceberg catalog implementations treat DROP TABLE PURGE as a best-effort operation: 1.Only delete the data files that are explicitly listed in the TableMetadata snapshot that was fetched just before the drop. 2.They perform no CAS-style write on the metadata location; if a second writer commits successfully between “read metadata” and “delete metadata pointer”, the new files become orphans—they’re invisible to the table (which is now gone) 3.In documentation, it seems that Orphan file maintainance procedure works when the table is visible, have to first load the table then do orphan file deletion. https://iceberg.apache.org/docs/1.6.1/docs/maintenance/?h=orphan#delete-orphan-files Has anyone else hit this gap? Are there plans to introduce a stand-alone “orphan-sweep” tool that can be run against a deleted-table prefix? Would love to hear how others are handling this in production S3 environments. Thanks! Below is a snippet from HiveCatalog.java and my questions are in code comment. ``` @Override public boolean dropTable(TableIdentifier identifier, boolean purge) { if (!isValidIdentifier(identifier)) { return false; } String database = identifier.namespace().level(0); TableOperations ops = newTableOps(identifier); TableMetadata lastMetadata = null; if (purge) { try { // might not be the 'lastest' metadata, if there is another writer commited before dropTable below. lastMetadata = ops.current(); } catch (NotFoundException e) { LOG.warn( "Failed to load table metadata for table: {}, continuing drop without purge", identifier, e); } } try { clients.run( client -> { client.dropTable( database, identifier.name(), false /* do not delete data */, false /* throw NoSuchObjectException if the table doesn't exist */); return null; }); if (purge && lastMetadata != null) { // as the comment says: Drops all data and metadata files referenced by TableMetadata. // What about orphan files? CatalogUtil.dropTableData(ops.io(), lastMetadata); } LOG.info("Dropped table: {}", identifier); return true; } catch (NoSuchTableException | NoSuchObjectException e) { LOG.info("Skipping drop, table does not exist: {}", identifier, e); return false; } catch (TException e) { throw new RuntimeException("Failed to drop " + identifier, e); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new RuntimeException("Interrupted in call to dropTable", e); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
