CDH 5.5.1 cluster with Kerberos, slider version 0.80 Sometimes Slider commands start hanging
slider list <app> --containers [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas --containers 2017-09-18 21:44:45,659 [main] INFO tools.SliderUtils - JVM initialized into secure mode with kerberos realm BIGDATA Exception: Call From <host running command>/<host_ip> to <slider_AM_HOST> failed on socket timeout exception: java.net.SocketTimeoutException: 15000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/<slider command_host>:46777 remote=<host_running_slider_am>/<IP of host running slider am>:32120]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source) at org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy.getLiveContainers(SliderClusterProtocolProxy.java:229) at org.apache.slider.client.ipc.SliderClusterOperations.getContainers(SliderClusterOperations.java:458) at org.apache.slider.client.SliderClient.getContainers(SliderClient.java:2763) at org.apache.slider.client.SliderClient.actionList(SliderClient.java:2735) at org.apache.slider.client.SliderClient.exec(SliderClient.java:510) at org.apache.slider.client.SliderClient.runService(SliderClient.java:424) at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630) at org.apache.slider.Slider.main(Slider.java:49) Caused by: java.net.SocketTimeoutException: 15000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/<Local_IP>:46777 remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:515) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1075) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970) 2017-09-18 21:45:01,499 [main] INFO util.ExitUtil - Exiting with status 56 Slider AM Log Shows no errors. The only warning I can see is about TGT renewer 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN security.UserGroupInformation - Exception encountered while running the renewal command. Aborting renew thread. ExitCodeException exitCode=1: kinit: Ticket expired while renewing credentials 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO ipc.Server - Auth successful for xyz@mydomain (auth:SIMPLE) 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO authorize.ServiceAuthorizationManager - Authorization successful for xyz@mydomain (auth:TOKEN) for protocol=interface org.apache.slider.server.appmaster.rpc.SliderClusterProtocolPB 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO ipc.Server - Auth successful for xyz@mydomain (auth:SIMPLE) 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO authorize.ServiceAuthorizationManager - Authorization successful for xyz@mydomain (auth:TOKEN) for protocol=interface org.apache.slider.server.appmaster.rpc.SliderClusterProtocolPB