ruotianwang opened a new issue, #12076: URL: https://github.com/apache/iceberg/issues/12076
### Apache Iceberg version 1.7.1 (latest release) ### Query engine Spark ### Please describe the bug 🐞 During the usage of `partial-progress.max-failed-commits`, we've found that the threshold check's false positive rate is too high. After taking a deep look, within this PR: https://github.com/apache/iceberg/pull/9611 It first get the succeededCommits whenever there is a succeed commit, then calculating `int failedCommits = maxCommits - commitService.succeededCommits();` However, I've found a couple of cases that even though we defined the `partial-progress.max-commits` value, internally iceberg would optimize the group file into a lower number of this max-commits. eg: the actual group file can be smaller than maxCommits definition. In this case, the threshold check above will be wrong. The suggested solution would be instead of calculating succeed commit, we should directly collecting failure commit count and do comparison. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
