pratikpandey21 opened a new issue, #15369:
URL: https://github.com/apache/iceberg/issues/15369

   ### Feature Request / Improvement
   
   The current implementation of rewrite data files with 
`removeDanglingDeletes` only works on the main branch of the iceberg table.
   
   The RemoveDanglingDeleteFiles action has two issues:
   
     1. Unpartitioned tables are silently skipped
   
     The execute() method returns early with an empty result for unpartitioned 
tables:
   
     ```
     if (table.specs().size() == 1 && table.spec().isUnpartitioned()) {
       return ImmutableRemoveDanglingDeleteFiles.Result.builder()
           .removedDeleteFiles(Collections.emptyList())
           .build();
     }
    ```
   
   
     2. No branch support
   
     The action always operates on the main branch. There is no API to target a 
specific branch, and the Spark implementation reads metadata tables without 
branch scoping. This also means that when RewriteDataFilesSparkAction invokes 
RemoveDanglingDeleteFiles internally (via the remove-dangling-deletes option), 
it ignores the branch that the rewrite is targeting.
   
   
   
   Proposed Changes
   
     API (RemoveDanglingDeleteFiles):
     - Add a toBranch(String branch) method to allow targeting a specific 
branch.
   
   
   
   
   **Background:**
   - We're leveraging Flink to write to iceberg in streaming fashion, but using 
Write-Audit-Publish pattern. So flink writes to a branch.
   - Periodic Spark job that reads the latest changes on the branch, runs audit 
and tries to merge/fast-forward to main.
   - Sync to Snowflake.
   
   Since Flink streaming data in Upsert mode generates `equality-deletes` on 
branch, it is also present in the metadata on main. 
   
   Snowflake however doesn't support equality deletes for managed tables and 
this requires us to remove equality deletes from main branch, which is why we 
need the capability to remove dangling deletes from the branch, before we 
fast-forward to main.
   
   
   
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to