[ 
https://issues.apache.org/jira/browse/SPARK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103184#comment-15103184
 ] 

Sean Owen commented on SPARK-1747:
----------------------------------

[~tgraves] is there a chance this is going to result in a change?

> check for Spark on Yarn ApplicationMaster split brain
> -----------------------------------------------------
>
>                 Key: SPARK-1747
>                 URL: https://issues.apache.org/jira/browse/SPARK-1747
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Thomas Graves
>
> On yarn there is a possibility that applications can end up with an issue 
> referred to as "split brain".  This problem is that you have one Application 
> Master running, something happens like a network split that the AM can no 
> longer talk to the ResourceManager. After some time the ResourceManager will 
> start a new application attempt assuming the old one failed and you end up 
> with 2 application masters.  Note the network split could prevent it from 
> talking to the RM but it could still be running along contacting regular 
> executors. 
> If the previous AM does not need any more resources from the RM it could try 
> to commit. This could cause lots of problems where the second AM finishes and 
> tries to commit too. This could potentially result in data corruption.
> I believe this same issue can happen on Spark since its using the hadoop 
> output formats.  One instance that has this issue is the FileOutputCommitter. 
>  It first writes to a temporary directory (task commit) and then  moves the 
> file to the final directory (job commit).  The first AM could finish the job 
> commit, tell the user its done, the user starts another down stream job, but 
> then the second AM comes in to do the job commit and files the down stream 
> job are processing could disappear until the second AM finishes the job 
> commit. 
> This was fixed in MR by https://issues.apache.org/jira/browse/MAPREDUCE-4832



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to