[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

cloud-fan Wed, 05 Sep 2018 10:40:59 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    @tgravescs thanks for testing it out! I've created 
https://issues.apache.org/jira/browse/SPARK-25341 and 
https://issues.apache.org/jira/browse/SPARK-25342 to track the followup.
    
    I think these two together is the long-term solution. Users can do 
sort/checkpoint to eliminate the indeterminacy, or use a reliable shuffle 
storage to avoid fetch failure(someone is proposing it in dev list). If users 
can't avoid it and hit the issue, this PR provides a final guard to rerun some 
stages and get correct result. For Spark 2.4 we just fail the job, and we will 
finish the above 2 tickets in Spark 3.0.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to