pvary commented on PR #12754: URL: https://github.com/apache/iceberg/pull/12754#issuecomment-2792359044
Thanks for the PR @sunxiaojian. As mentioned on the linked issue (https://github.com/apache/iceberg/issues/12674#issuecomment-2761625835) we are working on Flink Table Maintenance. There is a planned delete orphan files task there. Implementing the feature there would better serve future users, as orphan file removal could be used in multiple ways: - Standalone action - Automatic table maintenance in Flink sinks - Externally scheduled maintenance tasks I have concerns about duplicating code from the Spark action, like the file name normalization, recursive listing, etc. When discussed this with @mxm, he suggested that maybe this should also be part of Iceberg core planning. ManifestFileBean seems a bit to specific for me. This is basically a table scan for a metadata table. So if we accept to return `RowData` instead of a specific object, we can reuse the planning, and the reading in other places later. For example when we need to read the Manifest files for metadata compaction. In Iceberg we usually create the first PR only for the main branch of the supported engine (Flink 1.20), and do the review on that. After that version is merged, we create a separate PR which backports the changes to the older versions (Flink 1.19/1.18 currently). This helps the reviewers (smaller amount of code), and the contributor too, as the requested changes needed to be applied only to a single version. Thanks, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
