Re: ResourceManager shutting down

2014-03-14 Thread Hitesh Shah
Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lil...@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
> application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> selectchannelconnec...@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8050
> … and so on, it shuts down
>  



RE: ResourceManager shutting down

2014-03-13 Thread Rohith Sharma K S
Hi Hitesh,

  Yes it is an issue. This is handled in 
https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix 
available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-Original Message-
From: Hitesh Shah [mailto:hit...@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lil...@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped selectchannelconnec...@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  



Re: ResourceManager shutting down

2014-03-13 Thread Jian He
Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah  wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lil...@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> > at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> > at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> > at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> > at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> > ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped selectchannelconnec...@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping ser

Re: ResourceManager shutting down

2014-03-13 Thread Hitesh Shah
Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lil...@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
> application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> selectchannelconnec...@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8050
> … and so on, it shuts down
>  



RE: ResourceManager shutting down

2014-03-13 Thread John Lilley
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an 
UnknownHostException.  The odd thing is, the host it complains about have been 
in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: 
skitzo.office.datalever.com
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
selectchannelconnec...@metallica.office.datalever.com:8088<mailto:selectchannelconnec...@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
(ApplicationMasterLauncher.java:run(98)) - 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8050
... and so on, it shuts down



ResourceManager shutting down

2014-03-13 Thread John Lilley
We have this erratic behavior where every so often the RM will shutdown with an 
UnknownHostException.  The odd thing is, the host it complains about have been 
in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: 
skitzo.office.datalever.com
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
selectchannelconnec...@metallica.office.datalever.com:8088
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
(ApplicationMasterLauncher.java:run(98)) - 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8050
... and so on, it shuts down