[ https://issues.apache.org/jira/browse/SPARK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103184#comment-15103184 ]
Sean Owen commented on SPARK-1747: ---------------------------------- [~tgraves] is there a chance this is going to result in a change? > check for Spark on Yarn ApplicationMaster split brain > ----------------------------------------------------- > > Key: SPARK-1747 > URL: https://issues.apache.org/jira/browse/SPARK-1747 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.0.0 > Reporter: Thomas Graves > > On yarn there is a possibility that applications can end up with an issue > referred to as "split brain". This problem is that you have one Application > Master running, something happens like a network split that the AM can no > longer talk to the ResourceManager. After some time the ResourceManager will > start a new application attempt assuming the old one failed and you end up > with 2 application masters. Note the network split could prevent it from > talking to the RM but it could still be running along contacting regular > executors. > If the previous AM does not need any more resources from the RM it could try > to commit. This could cause lots of problems where the second AM finishes and > tries to commit too. This could potentially result in data corruption. > I believe this same issue can happen on Spark since its using the hadoop > output formats. One instance that has this issue is the FileOutputCommitter. > It first writes to a temporary directory (task commit) and then moves the > file to the final directory (job commit). The first AM could finish the job > commit, tell the user its done, the user starts another down stream job, but > then the second AM comes in to do the job commit and files the down stream > job are processing could disappear until the second AM finishes the job > commit. > This was fixed in MR by https://issues.apache.org/jira/browse/MAPREDUCE-4832 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org