[jira] [Commented] (MAPREDUCE-7123) AM Failed with Communication error to RM

2019-10-29 Thread Amithsha (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962711#comment-16962711
 ] 

Amithsha commented on MAPREDUCE-7123:
-

[~cane] we closed this since the versions are not same (version running on 
container and version running on RM ) May version be  incompatibility.

> AM Failed with Communication error to RM
> 
>
> Key: MAPREDUCE-7123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> During the restart of nodemanagers in 300 node cluster some jobs failed with 
> the following exceptions.
> But the nodes where the AM launched is not the part of cluster.
> FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2146)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2139)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:998) 
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) 
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1346)
>  at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1342)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) 
> at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,425 ERROR 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:875)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:776)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:256)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
>  at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,427 INFO 
> [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7123) AM Failed with Communication error to RM

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957526#comment-16957526
 ] 

zhoukang commented on MAPREDUCE-7123:
-

What is the root cause? [~Amithsha] We also encountered this error, which can 
be fixed by upgrade the client.

> AM Failed with Communication error to RM
> 
>
> Key: MAPREDUCE-7123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> During the restart of nodemanagers in 300 node cluster some jobs failed with 
> the following exceptions.
> But the nodes where the AM launched is not the part of cluster.
> FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2146)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2139)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:998) 
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) 
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1346)
>  at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1342)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) 
> at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,425 ERROR 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:875)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:776)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:256)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
>  at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,427 INFO 
> [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7123) AM Failed with Communication error to RM

2018-07-24 Thread Amithsha (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555090#comment-16555090
 ] 

Amithsha commented on MAPREDUCE-7123:
-

This we found in nodemanager where 2.9.0 hadoop is installed and the framework 
path is pointed to 2.7.1.

This lead to job failures on the cluster randomly.

Ex:

During restart of this particular node will cause the AM to fail with following 
exception.

ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING 
RM. java.lang.NullPointerException at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:800)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:736)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:244)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
 at java.lang.Thread.run(Thread.java:745) 2018-07-25 08:44:52,704 DEBUG [IPC 
Server handler 6 on 24503] org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 24503: statusUpdate(attempt_1532432962204_0011_m_00_0, 
org.apache.hadoop.mapred.MapTaskStatus@32c2a068), rpc version=2, client 
version=19, methodsFingerPrint=937413979 from 10.3x.x.x:21017 Call#16 Retry#0 
for RpcKind RPC_WRITABLE 2018-07-25 08:44:52,704 FATAL [AsyncDispatcher event 
handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher 
thread java.lang.NullPointerException at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2144)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2137)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
at java.lang.Thread.run(Thread.java:745)

 

> AM Failed with Communication error to RM
> 
>
> Key: MAPREDUCE-7123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> During the restart of nodemanagers in 300 node cluster some jobs failed with 
> the following exceptions.
> But the nodes where the AM launched is not the part of cluster.
> FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2146)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2139)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:998) 
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) 
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1346)
>  at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1342)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) 
> at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,425 ERROR 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.l

[jira] [Commented] (MAPREDUCE-7123) AM Failed with Communication error to RM

2018-07-16 Thread Amithsha (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546054#comment-16546054
 ] 

Amithsha commented on MAPREDUCE-7123:
-

>From the stack trace Found that the error is running from Mapreduce 2.7.1 
Where the Resource manager and nodemanager is running on 2.9.0.
So the client code is 2.7.1 and error is from that client code which may not be 
handled in 2.9.0.

> AM Failed with Communication error to RM
> 
>
> Key: MAPREDUCE-7123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> During the restart of nodemanagers in 300 node cluster some jobs failed with 
> the following exceptions.
> But the nodes where the AM launched is not the part of cluster.
> FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2146)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2139)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:998) 
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) 
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1346)
>  at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1342)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) 
> at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,425 ERROR 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:875)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:776)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:256)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
>  at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,427 INFO 
> [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7123) AM Failed with Communication error to RM

2018-07-16 Thread Amithsha (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546017#comment-16546017
 ] 

Amithsha commented on MAPREDUCE-7123:
-

[~jlowe], [~wangda] 

> AM Failed with Communication error to RM
> 
>
> Key: MAPREDUCE-7123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> During the restart of nodemanagers in 300 node cluster some jobs failed with 
> the following exceptions.
> But the nodes where the AM launched is not the part of cluster.
> FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2146)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$UpdatedNodesTransition.transition(JobImpl.java:2139)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:998) 
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) 
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1346)
>  at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1342)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) 
> at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,425 ERROR 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:875)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:776)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:256)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
>  at java.lang.Thread.run(Thread.java:745) 2018-07-14 12:34:53,427 INFO 
> [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org