qinghui-xu opened a new issue, #6579:
URL: https://github.com/apache/iceberg/issues/6579
### Feature Request / Improvement
We have a streaming pipeline serving (upsert) data to a table, and a spark
compaction job to rewrite files asynchronously.
Compaction job will fail to commit when streaming pipeline commits to the
table with some deletion in existing data. To address this we enabled partial
commit in compaction job.
What we observe after enabling partial commit (say, with
`partial-progress.max-commit = 10`):
- First few partial commits succeeded
- Streaming job commits a snapshot with upsert
- All subsequent partial commits failed
In our case, the streaming pipeline is writing to all partitions at the same
time constantly, which means when a first partial commit fails because of
conflict all the subsequent partial commits would fail almost for sure. It
would be nice to abort the job sooner to avoid wasting resources on
doomed-to-fail processing.
### Query engine
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]