Re: [PR] Flink: supports clean orphan files [iceberg]

via GitHub Thu, 10 Apr 2025 08:15:46 -0700


sunxiaojian commented on PR #12754:
URL: https://github.com/apache/iceberg/pull/12754#issuecomment-2794184906


   @pvary thanks for your review.
   
   I did reference many implementations in Spark, especially the file recursion 
part. Deep recursion can bring performance issues, and there is a high 
possibility for optimization later. So it makes sense to move it to the core 
and abstract a default implementation in DeleteOrphanFiles. In this way, future 
optimizations will benefit all frameworks. But should this be done after the 
Flink implementation is completed and then the logic is extracted to the core 
uniformly?
   
   Regarding ManifestFileBean, I initially wanted to keep it consistent with 
Spark to facilitate the abstraction of logic on both sides to the core later. 
However, in practice, IcebergSource can also directly scan the metadata table 
and use RowData.
   
   I completely agree with you that it should only support the main branch for 
now, which makes it easier for review.
   
   These are my thoughts on implementing the Flink orphan files. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: supports clean orphan files [iceberg]

Reply via email to