Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71956823 > The users on my side have been able to reproduce the missing files issue reliably, so we may just have to live with an empirical verification and be done with it. Given that the actual bug is non-deterministic, I think we could be okay without a regression test that reliably reproduces this issue. There might be some value in a non-deterministic regression test, though, as long as it detects the bug with sufficiently high probability, since we'd eventually catch any regression by noticing that the test had become flaky in Jenkins. Unless we can come up with a better test, in the immediate term I'm okay with having unit tests for the individual components and an empirical verification using your reproduction. Even though they aren't regression tests, the new tests added here will be helpful for preventing regressions if anyone changes the OutputCommitCoordinator logic.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org