zhangdove opened a new pull request #1313: URL: https://github.com/apache/iceberg/pull/1313
My use case : 1. table.expireSnapshots().cleanExpiredFiles(false).commit() [https://github.com/apache/iceberg/pull/1244] 2. actions.removeOrphanFiles().olderThan(t1).execute() The first step takes about two seconds.However,in the second step of deleting the files, as the number of files increases, the deletion time becomes slower and slower, which is not what I want. If I do not understand the error, delete the file executed by a single thread in Spark Driver. Can we move the execution-deletion file from the Driver side to Spark's Executor to do multithreaded erasure of orphaned files? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
