Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs thanks for testing it out! I've created
https://issues.apache.org/jira/browse/SPARK-25341 and
https://issues.apache.org/jira/browse/SPARK-25342 to track the followup.
I think these two together is the long-term solution. Users can do
sort/checkpoint to eliminate the indeterminacy, or use a reliable shuffle
storage to avoid fetch failure(someone is proposing it in dev list). If users
can't avoid it and hit the issue, this PR provides a final guard to rerun some
stages and get correct result. For Spark 2.4 we just fail the job, and we will
finish the above 2 tickets in Spark 3.0.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]