t3hw commented on PR #15710:
URL: https://github.com/apache/iceberg/pull/15710#issuecomment-4106491280

   Thanks for the PR, @koodin9 — and for the feedback on #15651 that identified 
the data loss scenario. That prompted a full redesign on my end. Your core 
approach here is exactly right, and I've built additional hardening on top of 
the same foundation. Would love to collaborate — here's what I've added:
   
   - **Per-table group commits**: tables in parallel, commitId groups 
sequential per table — stale failure only blocks its own table
   - **Selective buffer draining**: only successfully committed envelopes 
removed; failed groups retry next cycle
   - **Error escalation**: configurable blocking retries → failure policy 
(fail/non-blocking) → TTL eviction with orphaned file path logging
   - **Per-group offsets**: stale groups write their own envelope offsets (no 
null guards needed), preventing offset poisoning
   - **Partial offset advancement**: on partial success, consumer offsets 
advance to min uncommitted offset
   - **JMX monitoring**: `CommitStateMXBean` for stale group count, buffer 
size, eviction metrics
   - **8 new tests** covering group ordering, selective removal, and failure 
scenarios
   
   Happy to push commits on your branch or open a stacked PR — whatever works 
best.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to