I found a similar question answered here: https://community.hortonworks.com/questions/53776/warn-securityusergroupinformation-exception-encoun.html
It seems like it is related to the ticket renewal / expiration, so maybe you need to kinit again, or maybe you did kinit again but it wasn't before the ticket expired, or maybe the renewable life of the ticket has been exceeded. I suggest trying kdestroy before issuing another kinit to see if that addresses the issue. You may also want to check the renewal settings on the KDC. On Fri, Sep 22, 2017 at 2:32 PM, Manoj Samel <manojsamelt...@gmail.com> wrote: > - The issue happens intermittently. HOWEVER, once the Slider AM starts > giving these timeout errors; it stays in that error mode. Then it > cannot be > stopped (stop command gives same error). The only way is to kill the > slider > App using "yarn application -kill <Slider App ID>" , which of course > kills > the entire app. > - After AM starts giving the timeouts, it is still possible to ping AM > host:port using "nc" etc. so it does not seems to be a network issue. > > > On Thu, Sep 21, 2017 at 12:47 PM, Gour Saha <gs...@hortonworks.com> wrote: > > > Just to see if the AM UI is accessible when the CLI fails. Seems like > your > > issue is intermittent. RPC timeout for CLIs are set to 15 secs, so there > > could be several reasons for which the timeout occurs. Do you see any > > network/routing issue to connect to the host where the AM is running? > > > > -Gour > > > > On 9/21/17, 12:31 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > > > > >Hi Gour, > > > > > >Will try to access the AM Web UI next time the issue happens. Is there > > >anything specific that should be checked within the AM UI ? Or is the > test > > >just to see if AM UI is accessible at all ? > > > > > >Thanks, > > > > > >Manoj > > > > > >On Thu, Sep 21, 2017 at 11:26 AM, Gour Saha <gs...@hortonworks.com> > > wrote: > > > > > >> Are you able to go to the RM UI and load the ApplicationMaster web ui > > >>for > > >> this app? > > >> > > >> -Gour > > >> > > >> On 9/21/17, 11:00 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > > >> > > >> >Any thoughts ? > > >> > > > >> >On Mon, Sep 18, 2017 at 3:22 PM, Manoj Samel < > manojsamelt...@gmail.com > > > > > >> >wrote: > > >> > > > >> >> > > >> >> CDH 5.5.1 cluster with Kerberos, slider version 0.80 > > >> >> > > >> >> Sometimes Slider commands start hanging > > >> >> > > >> >> slider list <app> --containers > > >> >> > > >> >> [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas > --containers > > >> >> 2017-09-18 21:44:45,659 [main] INFO tools.SliderUtils - JVM > > >>initialized > > >> >> into secure mode with kerberos realm BIGDATA > > >> >> Exception: Call From <host running command>/<host_ip> to > > >> >><slider_AM_HOST> > > >> >> failed on socket timeout exception: java.net. > SocketTimeoutException: > > >> >> 15000 millis timeout while waiting for channel to be ready for > read. > > >>ch > > >> >>: > > >> >> java.nio.channels.SocketChannel[connected local=/<slider > > >> >> command_host>:46777 remote=<host_running_slider_am>/<IP of host > > >>running > > >> >> slider am>:32120]; For more details see: http://wiki.apache.org/ > > >> >> hadoop/SocketTimeout > > >> >> at sun.reflect.NativeConstructorAccessorImpl. > newInstance0(Native > > >> >> Method) > > >> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance( > > >> >> NativeConstructorAccessorImpl.java:62) > > >> >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > > >> >> DelegatingConstructorAccessorImpl.java:45) > > >> >> at > > >>java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > >> >> at org.apache.hadoop.net.NetUtils.wrapWithMessage( > > >> NetUtils.java:791) > > >> >> at > > >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) > > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) > > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) > > >> >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker. > > >> >> invoke(ProtobufRpcEngine.java:230) > > >> >> at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source) > > >> >> at > > >> >>org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy. > > >> >> getLiveContainers(SliderClusterProtocolProxy.java:229) > > >> >> at > > >> >>org.apache.slider.client.ipc.SliderClusterOperations.getContainers( > > >> >> SliderClusterOperations.java:458) > > >> >> at org.apache.slider.client.SliderClient.getContainers( > > >> >> SliderClient.java:2763) > > >> >> at org.apache.slider.client.SliderClient.actionList( > > >> >> SliderClient.java:2735) > > >> >> at org.apache.slider.client.SliderClient.exec( > > >> SliderClient.java:510) > > >> >> at org.apache.slider.client.SliderClient.runService( > > >> >> SliderClient.java:424) > > >> >> at org.apache.slider.core.main.ServiceLauncher.launchService( > > >> >> ServiceLauncher.java:188) > > >> >> at > > >> >>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly( > > >> >> ServiceLauncher.java:475) > > >> >> at org.apache.slider.core.main.ServiceLauncher. > > >> launchServiceAndExit( > > >> >> ServiceLauncher.java:403) > > >> >> at org.apache.slider.core.main.ServiceLauncher.serviceMain( > > >> >> ServiceLauncher.java:630) > > >> >> at org.apache.slider.Slider.main(Slider.java:49) > > >> >> Caused by: java.net.SocketTimeoutException: 15000 millis timeout > > >>while > > >> >> waiting for channel to be ready for read. ch : > > >> >>java.nio.channels.SocketChannel[connected > > >> >> local=/<Local_IP>:46777 > > >> >>remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120] > > >> >> at org.apache.hadoop.net.SocketIOWithTimeout.doIO( > > >> >> SocketIOWithTimeout.java:164) > > >> >> at org.apache.hadoop.net.SocketInputStream.read( > > >> >> SocketInputStream.java:161) > > >> >> at org.apache.hadoop.net.SocketInputStream.read( > > >> >> SocketInputStream.java:131) > > >> >> at java.io.FilterInputStream.read(FilterInputStream.java:133) > > >> >> at java.io.FilterInputStream.read(FilterInputStream.java:133) > > >> >> at org.apache.hadoop.ipc.Client$Connection$PingInputStream. > > >> >> read(Client.java:515) > > >> >> at java.io.BufferedInputStream.fill(BufferedInputStream.java: > > 246) > > >> >> at java.io.BufferedInputStream.read(BufferedInputStream.java: > > 265) > > >> >> at java.io.DataInputStream.readInt(DataInputStream.java:387) > > >> >> at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse( > > >> >> Client.java:1075) > > >> >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java: > 970) > > >> >> 2017-09-18 21:45:01,499 [main] INFO util.ExitUtil - Exiting with > > >> >>status 56 > > >> >> > > >> >> > > >> >> Slider AM Log Shows no errors. The only warning I can see is about > > >>TGT > > >> >> renewer > > >> >> > > >> >> 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN > > >> >> security.UserGroupInformation - Exception encountered while > running > > >>the > > >> >> renewal command. Aborting renew thread. ExitCodeException > exitCode=1: > > >> >> kinit: Ticket expired while renewing credentials > > >> >> 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO > > >> >>ipc.Server > > >> >> - Auth successful for xyz@mydomain (auth:SIMPLE) > > >> >> 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO > > >> >>authorize.ServiceAuthorizationManager > > >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for > > >> >> protocol=interface org.apache.slider.server.appmaster.rpc. > > >> >> SliderClusterProtocolPB > > >> >> 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO > > >> >>ipc.Server > > >> >> - Auth successful for xyz@mydomain (auth:SIMPLE) > > >> >> 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO > > >> >>authorize.ServiceAuthorizationManager > > >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for > > >> >> protocol=interface org.apache.slider.server.appmaster.rpc. > > >> >> SliderClusterProtocolPB > > >> >> > > >> > > >> > > > > >