Re: [PR] Flink: supports clean orphan files [iceberg]

via GitHub Thu, 10 Apr 2025 03:57:59 -0700


pvary commented on PR #12754:
URL: https://github.com/apache/iceberg/pull/12754#issuecomment-2792359044


   Thanks for the PR @sunxiaojian.
   As mentioned on the linked issue 
(https://github.com/apache/iceberg/issues/12674#issuecomment-2761625835) we are 
working on Flink Table Maintenance. There is a planned delete orphan files task 
there.
   
   Implementing the feature there would better serve future users, as orphan 
file removal could be used in multiple ways:
   - Standalone action
   - Automatic table maintenance in Flink sinks
   - Externally scheduled maintenance tasks
   
   I have concerns about duplicating code from the Spark action, like the file 
name normalization, recursive listing, etc. When discussed this with @mxm, he 
suggested that maybe this should also be part of Iceberg core planning.
   
   ManifestFileBean seems a bit to specific for me. This is basically a table 
scan for a metadata table. So if we accept to return `RowData` instead of a 
specific object, we can reuse the planning, and the reading in other places 
later. For example when we need to read the Manifest files for metadata 
compaction.
   
   In Iceberg we usually create the first PR only for the main branch of the 
supported engine (Flink 1.20), and do the review on that. After that version is 
merged, we create a separate PR which backports the changes to the older 
versions (Flink 1.19/1.18 currently). This helps the reviewers (smaller amount 
of code), and the contributor too, as the requested changes needed to be 
applied only to a single version.
   
   Thanks,
   Peter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: supports clean orphan files [iceberg]

Reply via email to