[ https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965832#comment-14965832 ]
Reynold Xin commented on SPARK-8029: ------------------------------------ It'd be really good to fix this in 1.6, and maybe even backport it to older branches. [~irashid] Would you have time to give "Executors Commit ShuffleMapOutput: First Attempt Wins" in your design proposal a try? It seems like a much smaller fix needed, and the chance of that fix having problems is pretty low (despite you think it is "optimistic"). > ShuffleMapTasks must be robust to concurrent attempts on the same executor > -------------------------------------------------------------------------- > > Key: SPARK-8029 > URL: https://issues.apache.org/jira/browse/SPARK-8029 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.0 > Reporter: Imran Rashid > Assignee: Imran Rashid > Priority: Critical > Attachments: > AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf > > > When stages get retried, a task may have more than one attempt running at the > same time, on the same executor. Currently this causes problems for > ShuffleMapTasks, since all attempts try to write to the same output files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org