[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608169#comment-16608169 ]
Bruce Robbins commented on SPARK-23243: --------------------------------------- BTW, I took a stab at back porting it to 2.2, but to get it to work I had to also back port SPARK-20715. So my version has an additional 398 changed lines (from the additional back port of SPARK-20715). I can post that, but thought maybe someone might have a smaller version. > Shuffle+Repartition on an RDD could lead to incorrect answers > ------------------------------------------------------------- > > Key: SPARK-23243 > URL: https://issues.apache.org/jira/browse/SPARK-23243 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0 > Reporter: Jiang Xingbo > Assignee: Wenchen Fan > Priority: Blocker > Labels: correctness > Fix For: 2.3.2, 2.4.0 > > > The RDD repartition also uses the round-robin way to distribute data, this > can also cause incorrect answers on RDD workload the similar way as in > https://issues.apache.org/jira/browse/SPARK-23207 > The approach that fixes DataFrame.repartition() doesn't apply on the RDD > repartition issue, as discussed in > https://github.com/apache/spark/pull/20393#issuecomment-360912451 > We track for alternative solutions for this issue in this task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org