Any thoughts ?

On Mon, Sep 18, 2017 at 3:22 PM, Manoj Samel <manojsamelt...@gmail.com>
wrote:

>
> CDH 5.5.1 cluster with Kerberos, slider version 0.80
>
> Sometimes Slider commands start hanging
>
> slider list <app> --containers
>
> [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas --containers
> 2017-09-18 21:44:45,659 [main] INFO  tools.SliderUtils - JVM initialized
> into secure mode with kerberos realm BIGDATA
> Exception: Call From <host running command>/<host_ip> to <slider_AM_HOST>
> failed on socket timeout exception: java.net.SocketTimeoutException:
> 15000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/<slider
> command_host>:46777 remote=<host_running_slider_am>/<IP of host running
> slider am>:32120]; For more details see:  http://wiki.apache.org/
> hadoop/SocketTimeout
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.
> invoke(ProtobufRpcEngine.java:230)
>     at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source)
>     at org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy.
> getLiveContainers(SliderClusterProtocolProxy.java:229)
>     at org.apache.slider.client.ipc.SliderClusterOperations.getContainers(
> SliderClusterOperations.java:458)
>     at org.apache.slider.client.SliderClient.getContainers(
> SliderClient.java:2763)
>     at org.apache.slider.client.SliderClient.actionList(
> SliderClient.java:2735)
>     at org.apache.slider.client.SliderClient.exec(SliderClient.java:510)
>     at org.apache.slider.client.SliderClient.runService(
> SliderClient.java:424)
>     at org.apache.slider.core.main.ServiceLauncher.launchService(
> ServiceLauncher.java:188)
>     at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(
> ServiceLauncher.java:475)
>     at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(
> ServiceLauncher.java:403)
>     at org.apache.slider.core.main.ServiceLauncher.serviceMain(
> ServiceLauncher.java:630)
>     at org.apache.slider.Slider.main(Slider.java:49)
> Caused by: java.net.SocketTimeoutException: 15000 millis timeout while
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected
> local=/<Local_IP>:46777 remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120]
>     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> SocketIOWithTimeout.java:164)
>     at org.apache.hadoop.net.SocketInputStream.read(
> SocketInputStream.java:161)
>     at org.apache.hadoop.net.SocketInputStream.read(
> SocketInputStream.java:131)
>     at java.io.FilterInputStream.read(FilterInputStream.java:133)
>     at java.io.FilterInputStream.read(FilterInputStream.java:133)
>     at org.apache.hadoop.ipc.Client$Connection$PingInputStream.
> read(Client.java:515)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(
> Client.java:1075)
>     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970)
> 2017-09-18 21:45:01,499 [main] INFO  util.ExitUtil - Exiting with status 56
>
>
> Slider AM Log Shows no errors. The only warning I can see is about TGT
> renewer
>
> 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN
>  security.UserGroupInformation - Exception encountered while running the
> renewal command. Aborting renew thread. ExitCodeException exitCode=1:
> kinit: Ticket expired while renewing credentials
> 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO  ipc.Server
> - Auth successful for xyz@mydomain (auth:SIMPLE)
> 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO  
> authorize.ServiceAuthorizationManager
> - Authorization successful for xyz@mydomain (auth:TOKEN) for
> protocol=interface org.apache.slider.server.appmaster.rpc.
> SliderClusterProtocolPB
> 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO  ipc.Server
> - Auth successful for xyz@mydomain (auth:SIMPLE)
> 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO  
> authorize.ServiceAuthorizationManager
> - Authorization successful for xyz@mydomain (auth:TOKEN) for
> protocol=interface org.apache.slider.server.appmaster.rpc.
> SliderClusterProtocolPB
>

Reply via email to