qinghui-xu opened a new issue, #6579:
URL: https://github.com/apache/iceberg/issues/6579

   ### Feature Request / Improvement
   
   We have a streaming pipeline serving (upsert) data to a table, and a spark 
compaction job to rewrite files asynchronously.
   Compaction job will fail to commit when streaming pipeline commits to the 
table with some deletion in existing data. To address this we enabled partial 
commit in compaction job.
   
   What we observe after enabling partial commit (say, with 
`partial-progress.max-commit = 10`):
   - First few partial commits succeeded
   - Streaming job commits a snapshot with upsert
   - All subsequent partial commits failed
   
   In our case, the streaming pipeline is writing to all partitions at the same 
time constantly, which means when a first partial commit fails because of 
conflict all the subsequent partial commits would fail almost for sure. It 
would be nice to abort the job sooner to avoid wasting resources on 
doomed-to-fail processing.
    
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to