[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4832:
----------------------------------

    Attachment: MAPREDUCE-4832.patch

Patch that implements the "commit window" concept outlined above.  The AM will 
not allow task commits or job commit to proceed unless it has heard back from 
the RM within the configured amount of time (10 seconds by default).
                
> MR AM can get in a split brain situation
> ----------------------------------------
>
>                 Key: MAPREDUCE-4832
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>         Attachments: MAPREDUCE-4832.patch
>
>
> It is possible for a networking issue to happen where the RM thinks an AM has 
> gone down and launches a replacement, but the previous AM is still up and 
> running.  If the previous AM does not need any more resources from the RM it 
> could try to commit either tasks or jobs.  This could cause lots of problems 
> where the second AM finishes and tries to commit too.  This could result in 
> data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to