[jira] [Commented] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned

Hudson (JIRA) Sat, 01 Sep 2012 07:02:11 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446711#comment-13446711
 ]


Hudson commented on MAPREDUCE-4611:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1183 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1183/])
    MAPREDUCE-4611. MR AM dies badly when Node is decommissioned (Robert Evans 
via tgraves) (Revision 1379599)

     Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1379599
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> MR AM dies badly when Node is decomissioned
> -------------------------------------------
>
>                 Key: MAPREDUCE-4611
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3, 3.0.0, 2.2.0-alpha
>
>         Attachments: MR-4611.txt
>
>
> The MR AM always thinks that it is being killed by the RM when it gets a kill 
> signal and it has not finished processing yet.  In reality the RM kill signal 
> is only sent when the client cannot communicate directly with the AM, which 
> probably means that the AM is in a bad state already.  The much more common 
> case is that the node is marked as unhealthy or decomissioned.
> I propose that in the short term the AM will only clean up if 
>  # The process has been asked by the client to exit (kill)
>  # The process job has finished cleanly and is exiting already
>  # This is that last retry of the AM retries.
> The downside here is that the .staging directory will be leaked and the job 
> will not show up in the history server on an kill from the RM in some cases.
> At least until the full set of AM cleanup issues can be addressed, probably 
> as part of MAPREDUCE-4428

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned

Reply via email to