[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113309#comment-13113309
 ] 

Devaraj K commented on MAPREDUCE-3070:
--------------------------------------

Even if the node manager is restarted for any purpose(like cluster 
maintenance), NM should wait until the 
"yarn.resourcemanager.nm.liveness-monitor.expiry-interval-ms" which is 10 
minutes by default to register. Decreasing the default time is also not 
feasible. 

Proposal is, 
we can cleanup and register NM even if the registration is requested before the 
expiry of NM.

> NM not able to register with RM after NM restart
> ------------------------------------------------
>
>                 Key: MAPREDUCE-3070
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>            Priority: Blocker
>
> After stopping NM gracefully then starting NM, NM registration fails with RM 
> with Duplicate registration from the node! error.
> {noformat} 
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager 
> (NodeManager.java:main(204)) - Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
> Duplicate registration from the node!
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
>       ... 2 more
> Caused by: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
> Duplicate registration from the node!
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
>       at $Proxy13.registerNodeManager(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
>       ... 3 more
> {noformat} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to