Are you able to go to the RM UI and load the ApplicationMaster web ui for this app?
-Gour On 9/21/17, 11:00 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >Any thoughts ? > >On Mon, Sep 18, 2017 at 3:22 PM, Manoj Samel <manojsamelt...@gmail.com> >wrote: > >> >> CDH 5.5.1 cluster with Kerberos, slider version 0.80 >> >> Sometimes Slider commands start hanging >> >> slider list <app> --containers >> >> [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas --containers >> 2017-09-18 21:44:45,659 [main] INFO tools.SliderUtils - JVM initialized >> into secure mode with kerberos realm BIGDATA >> Exception: Call From <host running command>/<host_ip> to >><slider_AM_HOST> >> failed on socket timeout exception: java.net.SocketTimeoutException: >> 15000 millis timeout while waiting for channel to be ready for read. ch >>: >> java.nio.channels.SocketChannel[connected local=/<slider >> command_host>:46777 remote=<host_running_slider_am>/<IP of host running >> slider am>:32120]; For more details see: http://wiki.apache.org/ >> hadoop/SocketTimeout >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance( >> NativeConstructorAccessorImpl.java:62) >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( >> DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) >> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) >> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker. >> invoke(ProtobufRpcEngine.java:230) >> at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source) >> at >>org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy. >> getLiveContainers(SliderClusterProtocolProxy.java:229) >> at >>org.apache.slider.client.ipc.SliderClusterOperations.getContainers( >> SliderClusterOperations.java:458) >> at org.apache.slider.client.SliderClient.getContainers( >> SliderClient.java:2763) >> at org.apache.slider.client.SliderClient.actionList( >> SliderClient.java:2735) >> at org.apache.slider.client.SliderClient.exec(SliderClient.java:510) >> at org.apache.slider.client.SliderClient.runService( >> SliderClient.java:424) >> at org.apache.slider.core.main.ServiceLauncher.launchService( >> ServiceLauncher.java:188) >> at >>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly( >> ServiceLauncher.java:475) >> at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit( >> ServiceLauncher.java:403) >> at org.apache.slider.core.main.ServiceLauncher.serviceMain( >> ServiceLauncher.java:630) >> at org.apache.slider.Slider.main(Slider.java:49) >> Caused by: java.net.SocketTimeoutException: 15000 millis timeout while >> waiting for channel to be ready for read. ch : >>java.nio.channels.SocketChannel[connected >> local=/<Local_IP>:46777 >>remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120] >> at org.apache.hadoop.net.SocketIOWithTimeout.doIO( >> SocketIOWithTimeout.java:164) >> at org.apache.hadoop.net.SocketInputStream.read( >> SocketInputStream.java:161) >> at org.apache.hadoop.net.SocketInputStream.read( >> SocketInputStream.java:131) >> at java.io.FilterInputStream.read(FilterInputStream.java:133) >> at java.io.FilterInputStream.read(FilterInputStream.java:133) >> at org.apache.hadoop.ipc.Client$Connection$PingInputStream. >> read(Client.java:515) >> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) >> at java.io.DataInputStream.readInt(DataInputStream.java:387) >> at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse( >> Client.java:1075) >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970) >> 2017-09-18 21:45:01,499 [main] INFO util.ExitUtil - Exiting with >>status 56 >> >> >> Slider AM Log Shows no errors. The only warning I can see is about TGT >> renewer >> >> 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN >> security.UserGroupInformation - Exception encountered while running the >> renewal command. Aborting renew thread. ExitCodeException exitCode=1: >> kinit: Ticket expired while renewing credentials >> 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO >>ipc.Server >> - Auth successful for xyz@mydomain (auth:SIMPLE) >> 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO >>authorize.ServiceAuthorizationManager >> - Authorization successful for xyz@mydomain (auth:TOKEN) for >> protocol=interface org.apache.slider.server.appmaster.rpc. >> SliderClusterProtocolPB >> 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO >>ipc.Server >> - Auth successful for xyz@mydomain (auth:SIMPLE) >> 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO >>authorize.ServiceAuthorizationManager >> - Authorization successful for xyz@mydomain (auth:TOKEN) for >> protocol=interface org.apache.slider.server.appmaster.rpc. >> SliderClusterProtocolPB >>