fafacao86 opened a new issue, #14980:
URL: https://github.com/apache/iceberg/issues/14980

   ### Query engine
   
   _No response_
   
   ### Question
   
   Hi community,
   We’re using S3 as our main storage and I’ve noticed that built-in Iceberg 
catalog implementations treat DROP TABLE PURGE as a best-effort operation:
   1.Only delete the data files that are explicitly listed in the TableMetadata 
snapshot that was fetched just before the drop.
   2.They perform no CAS-style write on the metadata location; if a second 
writer commits successfully between “read metadata” and “delete metadata 
pointer”, the new files become orphans—they’re invisible to the table (which is 
now gone) 
   3.In documentation, it seems that Orphan file maintainance procedure works 
when the table is visible, have to first load the table then do orphan file 
deletion. 
https://iceberg.apache.org/docs/1.6.1/docs/maintenance/?h=orphan#delete-orphan-files
   Has anyone else hit this gap?  Are there plans to introduce a stand-alone 
“orphan-sweep” tool that can be run against a deleted-table prefix?
   Would love to hear how others are handling this in production S3 
environments.  Thanks!
   
   Below is a snippet from HiveCatalog.java and my questions are in code 
comment.
   ```
   @Override
     public boolean dropTable(TableIdentifier identifier, boolean purge) {
       if (!isValidIdentifier(identifier)) {
         return false;
       }
   
       String database = identifier.namespace().level(0);
   
       TableOperations ops = newTableOps(identifier);
       TableMetadata lastMetadata = null;
       if (purge) {
         try {
           // might not be the 'lastest' metadata, if there is another writer 
commited before dropTable below.
           lastMetadata = ops.current();
         } catch (NotFoundException e) {
           LOG.warn(
               "Failed to load table metadata for table: {}, continuing drop 
without purge",
               identifier,
               e);
         }
       }
   
       try {
         clients.run(
             client -> {
               client.dropTable(
                   database,
                   identifier.name(),
                   false /* do not delete data */,
                   false /* throw NoSuchObjectException if the table doesn't 
exist */);
               return null;
             });
   
         if (purge && lastMetadata != null) {
           // as the comment says: Drops all data and metadata files referenced 
by TableMetadata.
           // What about orphan files?
           CatalogUtil.dropTableData(ops.io(), lastMetadata);
         }
   
         LOG.info("Dropped table: {}", identifier);
         return true;
   
       } catch (NoSuchTableException | NoSuchObjectException e) {
         LOG.info("Skipping drop, table does not exist: {}", identifier, e);
         return false;
   
       } catch (TException e) {
         throw new RuntimeException("Failed to drop " + identifier, e);
   
       } catch (InterruptedException e) {
         Thread.currentThread().interrupt();
         throw new RuntimeException("Interrupted in call to dropTable", e);
       }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to