- The issue happens intermittently. HOWEVER, once the Slider AM starts
   giving these timeout errors; it stays in that error mode. Then it cannot be
   stopped (stop command gives same error). The only way is to kill the slider
   App using "yarn application -kill <Slider App ID>" , which of course kills
   the entire app.
   - After AM starts giving the timeouts, it is still possible to ping AM
   host:port using "nc" etc. so it does not seems to be a network issue.


On Thu, Sep 21, 2017 at 12:47 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Just to see if the AM UI is accessible when the CLI fails. Seems like your
> issue is intermittent. RPC timeout for CLIs are set to 15 secs, so there
> could be several reasons for which the timeout occurs. Do you see any
> network/routing issue to connect to the host where the AM is running?
>
> -Gour
>
> On 9/21/17, 12:31 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>
> >Hi Gour,
> >
> >Will try to access the AM Web UI next time the issue happens. Is there
> >anything specific that should be checked within the AM UI ? Or is the test
> >just to see if AM UI is accessible at all ?
> >
> >Thanks,
> >
> >Manoj
> >
> >On Thu, Sep 21, 2017 at 11:26 AM, Gour Saha <gs...@hortonworks.com>
> wrote:
> >
> >> Are you able to go to the RM UI and load the ApplicationMaster web ui
> >>for
> >> this app?
> >>
> >> -Gour
> >>
> >> On 9/21/17, 11:00 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
> >>
> >> >Any thoughts ?
> >> >
> >> >On Mon, Sep 18, 2017 at 3:22 PM, Manoj Samel <manojsamelt...@gmail.com
> >
> >> >wrote:
> >> >
> >> >>
> >> >> CDH 5.5.1 cluster with Kerberos, slider version 0.80
> >> >>
> >> >> Sometimes Slider commands start hanging
> >> >>
> >> >> slider list <app> --containers
> >> >>
> >> >> [r...@s-76zyl02.sys.az1.eng.pdx.wd ~]# slider list spas --containers
> >> >> 2017-09-18 21:44:45,659 [main] INFO  tools.SliderUtils - JVM
> >>initialized
> >> >> into secure mode with kerberos realm BIGDATA
> >> >> Exception: Call From <host running command>/<host_ip> to
> >> >><slider_AM_HOST>
> >> >> failed on socket timeout exception: java.net.SocketTimeoutException:
> >> >> 15000 millis timeout while waiting for channel to be ready for read.
> >>ch
> >> >>:
> >> >> java.nio.channels.SocketChannel[connected local=/<slider
> >> >> command_host>:46777 remote=<host_running_slider_am>/<IP of host
> >>running
> >> >> slider am>:32120]; For more details see:  http://wiki.apache.org/
> >> >> hadoop/SocketTimeout
> >> >>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> >> >> Method)
> >> >>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> >> >> NativeConstructorAccessorImpl.java:62)
> >> >>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> >> >> DelegatingConstructorAccessorImpl.java:45)
> >> >>     at
> >>java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> >> >>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(
> >> NetUtils.java:791)
> >> >>     at
> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
> >> >>     at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> >> >>     at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> >> >>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.
> >> >> invoke(ProtobufRpcEngine.java:230)
> >> >>     at com.sun.proxy.$Proxy19.getLiveContainers(Unknown Source)
> >> >>     at
> >> >>org.apache.slider.server.appmaster.rpc.SliderClusterProtocolProxy.
> >> >> getLiveContainers(SliderClusterProtocolProxy.java:229)
> >> >>     at
> >> >>org.apache.slider.client.ipc.SliderClusterOperations.getContainers(
> >> >> SliderClusterOperations.java:458)
> >> >>     at org.apache.slider.client.SliderClient.getContainers(
> >> >> SliderClient.java:2763)
> >> >>     at org.apache.slider.client.SliderClient.actionList(
> >> >> SliderClient.java:2735)
> >> >>     at org.apache.slider.client.SliderClient.exec(
> >> SliderClient.java:510)
> >> >>     at org.apache.slider.client.SliderClient.runService(
> >> >> SliderClient.java:424)
> >> >>     at org.apache.slider.core.main.ServiceLauncher.launchService(
> >> >> ServiceLauncher.java:188)
> >> >>     at
> >> >>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(
> >> >> ServiceLauncher.java:475)
> >> >>     at org.apache.slider.core.main.ServiceLauncher.
> >> launchServiceAndExit(
> >> >> ServiceLauncher.java:403)
> >> >>     at org.apache.slider.core.main.ServiceLauncher.serviceMain(
> >> >> ServiceLauncher.java:630)
> >> >>     at org.apache.slider.Slider.main(Slider.java:49)
> >> >> Caused by: java.net.SocketTimeoutException: 15000 millis timeout
> >>while
> >> >> waiting for channel to be ready for read. ch :
> >> >>java.nio.channels.SocketChannel[connected
> >> >> local=/<Local_IP>:46777
> >> >>remote=<Slider_AM_HOST>/<slider_am_host_ip>:32120]
> >> >>     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> >> >> SocketIOWithTimeout.java:164)
> >> >>     at org.apache.hadoop.net.SocketInputStream.read(
> >> >> SocketInputStream.java:161)
> >> >>     at org.apache.hadoop.net.SocketInputStream.read(
> >> >> SocketInputStream.java:131)
> >> >>     at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >> >>     at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >> >>     at org.apache.hadoop.ipc.Client$Connection$PingInputStream.
> >> >> read(Client.java:515)
> >> >>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:
> 246)
> >> >>     at java.io.BufferedInputStream.read(BufferedInputStream.java:
> 265)
> >> >>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >> >>     at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(
> >> >> Client.java:1075)
> >> >>     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970)
> >> >> 2017-09-18 21:45:01,499 [main] INFO  util.ExitUtil - Exiting with
> >> >>status 56
> >> >>
> >> >>
> >> >> Slider AM Log Shows no errors. The only warning I can see is about
> >>TGT
> >> >> renewer
> >> >>
> >> >> 2017-09-18 15:40:57,009 [TGT Renewer for xyz@mydomain] WARN
> >> >>  security.UserGroupInformation - Exception encountered while running
> >>the
> >> >> renewal command. Aborting renew thread. ExitCodeException exitCode=1:
> >> >> kinit: Ticket expired while renewing credentials
> >> >> 2017-09-18 15:43:29,536 [Socket Reader #1 for port 32120] INFO
> >> >>ipc.Server
> >> >> - Auth successful for xyz@mydomain (auth:SIMPLE)
> >> >> 2017-09-18 15:43:29,537 [Socket Reader #1 for port 32120] INFO
> >> >>authorize.ServiceAuthorizationManager
> >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for
> >> >> protocol=interface org.apache.slider.server.appmaster.rpc.
> >> >> SliderClusterProtocolPB
> >> >> 2017-09-18 15:48:29,569 [Socket Reader #1 for port 32120] INFO
> >> >>ipc.Server
> >> >> - Auth successful for xyz@mydomain (auth:SIMPLE)
> >> >> 2017-09-18 15:48:29,570 [Socket Reader #1 for port 32120] INFO
> >> >>authorize.ServiceAuthorizationManager
> >> >> - Authorization successful for xyz@mydomain (auth:TOKEN) for
> >> >> protocol=interface org.apache.slider.server.appmaster.rpc.
> >> >> SliderClusterProtocolPB
> >> >>
> >>
> >>
>
>

Reply via email to