- The issue happens intermittently. HOWEVER, once the Slider AM starts giving these timeout errors; it stays in that error mode. Then it cannot be stopped (stop command gives same error). The only way is to kill the slider App using "yarn application -kill <Slider App ID>" , which of course kills the entire app. - After AM starts giving the timeouts, it is still possible to ping AM host:port using "nc" etc. so it does not seems to be a network issue.
On Thu, Sep 21, 2017 at 12:47 PM, Gour Saha <gs...@hortonworks.com> wrote: > Just to see if the AM UI is accessible when the CLI fails. Seems like your > issue is intermittent. RPC timeout for CLIs are set to 15 secs, so there > could be several reasons for which the timeout occurs. Do you see any > network/routing issue to connect to the host where the AM is running? > > -Gour > > On 9/21/17, 12:31 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > > >Hi Gour, > > > >Will try to access the AM Web UI next time the issue happens. Is there > >anything specific that should be checked within the AM UI ? Or is the test > >just to see if AM UI is accessible at all ? > > > >Thanks, > > > >Manoj > > > >On Thu, Sep 21, 2017 at 11:26 AM, Gour Saha <gs...@hortonworks.com> > wrote: > > > >> Are you able to go to the RM UI and load the ApplicationMaster web ui > >>for > >> this app? > >> > >> -Gour > >> > >> On 9/21/17, 11:00 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > >> > >> >Any thoughts ? > >> > > >> >On Mon, Sep 18, 2017 at 3:22 PM, Manoj Samel <manojsamelt...@gmail.com > > > >> >wrote: > >> > > >> >> > >> >> CDH 5.5.1 cluster with Kerberos, slider version 0.80 > >> >> > >> >> Sometimes Slider commands start hanging > >> >> > >> >> slider list <app> --containers > >> >> > >> >> [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas --containers > >> >> 2017-09-18 21:44:45,659 [main] INFO tools.SliderUtils - JVM > >>initialized > >> >> into secure mode with kerberos realm BIGDATA > >> >> Exception: Call From <host running command>/<host_ip> to > >> >><slider_AM_HOST> > >> >> failed on socket timeout exception: java.net.SocketTimeoutException: > >> >> 15000 millis timeout while waiting for channel to be ready for read. > >>ch > >> >>: > >> >> java.nio.channels.SocketChannel[connected local=/<slider > >> >> command_host>:46777 remote=<host_running_slider_am>/<IP of host > >>running > >> >> slider am>:32120]; For more details see: http://wiki.apache.org/ > >> >> hadoop/SocketTimeout > >> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > >> >> Method) > >> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance( > >> >> NativeConstructorAccessorImpl.java:62) > >> >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > >> >> DelegatingConstructorAccessorImpl.java:45) > >> >> at > >>java.lang.reflect.Constructor.newInstance(Constructor.java:422) > >> >> at org.apache.hadoop.net.NetUtils.wrapWithMessage( > >> NetUtils.java:791) > >> >> at > >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) > >> >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker. > >> >> invoke(ProtobufRpcEngine.java:230) > >> >> at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source) > >> >> at > >> >>org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy. > >> >> getLiveContainers(SliderClusterProtocolProxy.java:229) > >> >> at > >> >>org.apache.slider.client.ipc.SliderClusterOperations.getContainers( > >> >> SliderClusterOperations.java:458) > >> >> at org.apache.slider.client.SliderClient.getContainers( > >> >> SliderClient.java:2763) > >> >> at org.apache.slider.client.SliderClient.actionList( > >> >> SliderClient.java:2735) > >> >> at org.apache.slider.client.SliderClient.exec( > >> SliderClient.java:510) > >> >> at org.apache.slider.client.SliderClient.runService( > >> >> SliderClient.java:424) > >> >> at org.apache.slider.core.main.ServiceLauncher.launchService( > >> >> ServiceLauncher.java:188) > >> >> at > >> >>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly( > >> >> ServiceLauncher.java:475) > >> >> at org.apache.slider.core.main.ServiceLauncher. > >> launchServiceAndExit( > >> >> ServiceLauncher.java:403) > >> >> at org.apache.slider.core.main.ServiceLauncher.serviceMain( > >> >> ServiceLauncher.java:630) > >> >> at org.apache.slider.Slider.main(Slider.java:49) > >> >> Caused by: java.net.SocketTimeoutException: 15000 millis timeout > >>while > >> >> waiting for channel to be ready for read. ch : > >> >>java.nio.channels.SocketChannel[connected > >> >> local=/<Local_IP>:46777 > >> >>remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120] > >> >> at org.apache.hadoop.net.SocketIOWithTimeout.doIO( > >> >> SocketIOWithTimeout.java:164) > >> >> at org.apache.hadoop.net.SocketInputStream.read( > >> >> SocketInputStream.java:161) > >> >> at org.apache.hadoop.net.SocketInputStream.read( > >> >> SocketInputStream.java:131) > >> >> at java.io.FilterInputStream.read(FilterInputStream.java:133) > >> >> at java.io.FilterInputStream.read(FilterInputStream.java:133) > >> >> at org.apache.hadoop.ipc.Client$Connection$PingInputStream. > >> >> read(Client.java:515) > >> >> at java.io.BufferedInputStream.fill(BufferedInputStream.java: > 246) > >> >> at java.io.BufferedInputStream.read(BufferedInputStream.java: > 265) > >> >> at java.io.DataInputStream.readInt(DataInputStream.java:387) > >> >> at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse( > >> >> Client.java:1075) > >> >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970) > >> >> 2017-09-18 21:45:01,499 [main] INFO util.ExitUtil - Exiting with > >> >>status 56 > >> >> > >> >> > >> >> Slider AM Log Shows no errors. The only warning I can see is about > >>TGT > >> >> renewer > >> >> > >> >> 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN > >> >> security.UserGroupInformation - Exception encountered while running > >>the > >> >> renewal command. Aborting renew thread. ExitCodeException exitCode=1: > >> >> kinit: Ticket expired while renewing credentials > >> >> 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO > >> >>ipc.Server > >> >> - Auth successful for xyz@mydomain (auth:SIMPLE) > >> >> 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO > >> >>authorize.ServiceAuthorizationManager > >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for > >> >> protocol=interface org.apache.slider.server.appmaster.rpc. > >> >> SliderClusterProtocolPB > >> >> 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO > >> >>ipc.Server > >> >> - Auth successful for xyz@mydomain (auth:SIMPLE) > >> >> 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO > >> >>authorize.ServiceAuthorizationManager > >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for > >> >> protocol=interface org.apache.slider.server.appmaster.rpc. > >> >> SliderClusterProtocolPB > >> >> > >> > >> > >