Re: ResourceManager shutting down
Hi John Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed. thanks -- Hitesh On Mar 13, 2014, at 2:29 PM, John Lilley wrote: > Never mind… we figured out its DNS entry was going missing. > john > > From: John Lilley [mailto:john.lil...@redpoint.net] > Sent: Thursday, March 13, 2014 2:52 PM > To: user@hadoop.apache.org > Subject: ResourceManager shutting down > > We have this erratic behavior where every so often the RM will shutdown with > an UnknownHostException. The odd thing is, the host it complains about have > been in use for days at that point without problem. Any ideas? > Thanks, > John > > > 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - > application_1394204725813_0220 State change from ACCEPTED to RUNNING > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.IllegalArgumentException: java.net.UnknownHostException: > skitzo.office.datalever.com > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com > ... 15 more > 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(453)) - Exiting, bbye.. > 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > selectchannelconnec...@metallica.office.datalever.com:8088 > 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher > (ApplicationMasterLauncher.java:run(98)) - > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread > interrupted. Returning. > 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - Stopping > server on 8141 > 2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - Stopping > server on 8050 > … and so on, it shuts down >
RE: ResourceManager shutting down
Hi Hitesh, Yes it is an issue. This is handled in https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix available on hadoop-2.4(unreleased). Thanks & Regards Rohith Sharma K S -Original Message- From: Hitesh Shah [mailto:hit...@apache.org] Sent: 14 March 2014 09:03 To: user@hadoop.apache.org Subject: Re: ResourceManager shutting down Hi John Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed. thanks -- Hitesh On Mar 13, 2014, at 2:29 PM, John Lilley wrote: > Never mind... we figured out its DNS entry was going missing. > john > > From: John Lilley [mailto:john.lil...@redpoint.net] > Sent: Thursday, March 13, 2014 2:52 PM > To: user@hadoop.apache.org > Subject: ResourceManager shutting down > > We have this erratic behavior where every so often the RM will shutdown with > an UnknownHostException. The odd thing is, the host it complains about have > been in use for days at that point without problem. Any ideas? > Thanks, > John > > > 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl > (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State > change from ACCEPTED to RUNNING > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(449)) - Error in handling event type > NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: java.net.UnknownHostException: > skitzo.office.datalever.com > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com > ... 15 more > 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(453)) - Exiting, bbye.. > 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - > Stopped selectchannelconnec...@metallica.office.datalever.com:8088 > 2014-03-13 14:38:16,013 ERROR > delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(557)) - > InterruptedExcpetion recieved for ExpiredTokenRemover thread > java.lang.InterruptedException: sleep interrupted > 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher > (ApplicationMasterLauncher.java:run(98)) - > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread > interrupted. Returning. > 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - > Stopping server on 8141 > 2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - > Stopping server on 8050 ... and so on, it shuts down >
Re: ResourceManager shutting down
Which Hadoop version are you running ? this should be recently fixed. Jian On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah wrote: > Hi John > > Would you mind filing a jira with more details. The RM going down just > because a host was not resolvable or DNS timed out is something that should > be addressed. > > thanks > -- Hitesh > > On Mar 13, 2014, at 2:29 PM, John Lilley wrote: > > > Never mind... we figured out its DNS entry was going missing. > > john > > > > From: John Lilley [mailto:john.lil...@redpoint.net] > > Sent: Thursday, March 13, 2014 2:52 PM > > To: user@hadoop.apache.org > > Subject: ResourceManager shutting down > > > > We have this erratic behavior where every so often the RM will shutdown > with an UnknownHostException. The odd thing is, the host it complains > about have been in use for days at that point without problem. Any ideas? > > Thanks, > > John > > > > > > 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl > (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change > from ACCEPTED to RUNNING > > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE > to the scheduler > > java.lang.IllegalArgumentException: java.net.UnknownHostException: > skitzo.office.datalever.com > > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) > > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) > > at > org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com > > ... 15 more > > 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(453)) - Exiting, bbye.. > > 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - > Stopped selectchannelconnec...@metallica.office.datalever.com:8088 > > 2014-03-13 14:38:16,013 ERROR > delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: > sleep interrupted > > 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > > 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher > (ApplicationMasterLauncher.java:run(98)) - > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread > interrupted. Returning. > > 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - > Stopping ser
Re: ResourceManager shutting down
Hi John Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed. thanks -- Hitesh On Mar 13, 2014, at 2:29 PM, John Lilley wrote: > Never mind… we figured out its DNS entry was going missing. > john > > From: John Lilley [mailto:john.lil...@redpoint.net] > Sent: Thursday, March 13, 2014 2:52 PM > To: user@hadoop.apache.org > Subject: ResourceManager shutting down > > We have this erratic behavior where every so often the RM will shutdown with > an UnknownHostException. The odd thing is, the host it complains about have > been in use for days at that point without problem. Any ideas? > Thanks, > John > > > 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - > application_1394204725813_0220 State change from ACCEPTED to RUNNING > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.IllegalArgumentException: java.net.UnknownHostException: > skitzo.office.datalever.com > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com > ... 15 more > 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(453)) - Exiting, bbye.. > 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > selectchannelconnec...@metallica.office.datalever.com:8088 > 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher > (ApplicationMasterLauncher.java:run(98)) - > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread > interrupted. Returning. > 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - Stopping > server on 8141 > 2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - Stopping > server on 8050 > … and so on, it shuts down >
RE: ResourceManager shutting down
Never mind... we figured out its DNS entry was going missing. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, March 13, 2014 2:52 PM To: user@hadoop.apache.org Subject: ResourceManager shutting down We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException. The odd thing is, the host it complains about have been in use for days at that point without problem. Any ideas? Thanks, John 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.UnknownHostException: skitzo.office.datalever.com ... 15 more 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye.. 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped selectchannelconnec...@metallica.office.datalever.com:8088<mailto:selectchannelconnec...@metallica.office.datalever.com:8088> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system... 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete. 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - Stopping server on 8141 2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - Stopping server on 8050 ... and so on, it shuts down
ResourceManager shutting down
We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException. The odd thing is, the host it complains about have been in use for days at that point without problem. Any ideas? Thanks, John 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.UnknownHostException: skitzo.office.datalever.com ... 15 more 2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye.. 2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped selectchannelconnec...@metallica.office.datalever.com:8088 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted 2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system... 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. 2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete. 2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - Stopping server on 8141 2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - Stopping server on 8050 ... and so on, it shuts down