[ 
https://issues.apache.org/jira/browse/SPARK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-1747:
--------------------------------
    Labels: bulk-closed  (was: )

> check for Spark on Yarn ApplicationMaster split brain
> -----------------------------------------------------
>
>                 Key: SPARK-1747
>                 URL: https://issues.apache.org/jira/browse/SPARK-1747
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Thomas Graves
>            Priority: Major
>              Labels: bulk-closed
>
> On yarn there is a possibility that applications can end up with an issue 
> referred to as "split brain".  This problem is that you have one Application 
> Master running, something happens like a network split that the AM can no 
> longer talk to the ResourceManager. After some time the ResourceManager will 
> start a new application attempt assuming the old one failed and you end up 
> with 2 application masters.  Note the network split could prevent it from 
> talking to the RM but it could still be running along contacting regular 
> executors. 
> If the previous AM does not need any more resources from the RM it could try 
> to commit. This could cause lots of problems where the second AM finishes and 
> tries to commit too. This could potentially result in data corruption.
> I believe this same issue can happen on Spark since its using the hadoop 
> output formats.  One instance that has this issue is the FileOutputCommitter. 
>  It first writes to a temporary directory (task commit) and then  moves the 
> file to the final directory (job commit).  The first AM could finish the job 
> commit, tell the user its done, the user starts another down stream job, but 
> then the second AM comes in to do the job commit and files the down stream 
> job are processing could disappear until the second AM finishes the job 
> commit. 
> This was fixed in MR by https://issues.apache.org/jira/browse/MAPREDUCE-4832



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to