ConeyLiu commented on issue #4159:
URL: https://github.com/apache/iceberg/issues/4159#issuecomment-1048551526


   As @aokolnychyi suggested in 
[3056](https://github.com/apache/iceberg/pull/3056), we use 
`DeleteReachableFiles ` to purge table data which could provide much more 
scalability and performance. While there still some drawbacks that need to 
consider:
   
   1. Different catalog has a different implementation for drop table. For 
example, `HadoopCatalog`/`HadoopTables` delete the whole warehouse directly and 
ignore the purge argument. In this case, we could not use 
`DeleteReachableFiles`.
   2. User self catalog may have some customized features, such as sending 
event/metrics when purging data. With `DeleteReachableFiles` we will ignore 
those operations.
   
   > I think it should match the removal of reachable files and be consistent 
in all APIs. Once we know locations owned by the table, we may drop them too.
   
   I think this is necessary. We should unify the built-in catalog behavior of 
the drop table [purge]. And maybe need to define the interface to support some 
parallel operations (by leveraging distributed engine, such as 
spark/flink/more).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to