I agree with Imran, we need to fix SPARK-23243 and any correctness issues for 
that matter.
Tom
    On Wednesday, August 8, 2018, 9:06:43 AM CDT, Imran Rashid 
<iras...@cloudera.com.INVALID> wrote:  
 
 On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect answers
It turns out to be a very complicated issue, there is no consensus about what 
is the right fix yet. Likely to miss it in Spark 2.4 because it's a 
long-standing issue, not a regression.

This is a really serious data loss bug.  Yes its very complex, but we 
absolutely have to fix this, I really think it should be in 2.4.Has worked on 
it stopped?  

Reply via email to