ruotianwang opened a new issue, #12076:
URL: https://github.com/apache/iceberg/issues/12076

   ### Apache Iceberg version
   
   1.7.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   During the usage of `partial-progress.max-failed-commits`, we've found that 
the threshold check's false positive rate is too high. After taking a deep 
look, within this PR: https://github.com/apache/iceberg/pull/9611
   
   It first get the succeededCommits whenever there is a succeed commit, then 
calculating `int failedCommits = maxCommits - commitService.succeededCommits();`
   
   However, I've found a couple of cases that even though we defined the 
`partial-progress.max-commits` value, internally iceberg would optimize the 
group file into a lower number of this max-commits. eg: the actual group file 
can be smaller than maxCommits definition. In this case, the threshold check 
above will be wrong.
   
   The suggested solution would be instead of calculating succeed commit, we 
should directly collecting failure commit count and do comparison. 
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to