sunxiaojian commented on PR #12754: URL: https://github.com/apache/iceberg/pull/12754#issuecomment-2794184906
@pvary thanks for your review. I did reference many implementations in Spark, especially the file recursion part. Deep recursion can bring performance issues, and there is a high possibility for optimization later. So it makes sense to move it to the core and abstract a default implementation in DeleteOrphanFiles. In this way, future optimizations will benefit all frameworks. But should this be done after the Flink implementation is completed and then the logic is extracted to the core uniformly? Regarding ManifestFileBean, I initially wanted to keep it consistent with Spark to facilitate the abstraction of logic on both sides to the core later. However, in practice, IcebergSource can also directly scan the metadata table and use RowData. I completely agree with you that it should only support the main branch for now, which makes it easier for review. These are my thoughts on implementing the Flink orphan files. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
