Hi,

I have uploaded the config files, hope these shed light into the TICKET
authentication issue.

As a side note - it seems the commands like "slider list <app>
--containers" etc. now are ** significantly ** slower (compared when
slider-client.xml was not empty and had few properties). The commands
sometime take 1 minute on same cluster where it used to take few seconds
before. Also, it seems the first command executed after a some inactivity
takes long time to execute, command repeated immediately returns quickly.
Same is observed when slider AM restarts (e.g. due to upgrade). This
slowness was not present when slider-client.xml had config parameters like
registry zookeepers and RM address. Why would there be such a difference
for first execution when all config is read from HADOOP_CONF_DIR files

Following is a output of "slider list <xxx> --containers" executed twice.
Note first one took almost a minute, second run was almost instantaneous

[root@... ~]# slider list foo --containers
2016-07-29 23:30:35,197 [main] INFO  tools.SliderUtils - JVM initialized
into secure mode with kerberos realm xxx
2016-07-29 23:31:22,035 [main] INFO
 client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-07-29 23:31:22,162 [main] INFO  util.ExitUtil - Exiting with status 0
foo                               RUNNING  application_1469834604094_0001
           http://xxx:23188/proxy/application_1469834604094_0001/
......
[root@... ~]# slider list foo --containers
2016-07-29 23:32:34,816 [main] INFO  tools.SliderUtils - JVM initialized
into secure mode with kerberos realm xxx
2016-07-29 23:32:35,775 [main] INFO
 client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-07-29 23:32:35,896 [main] INFO  util.ExitUtil - Exiting with status 0
foo                               RUNNING  application_1469834604094_0001
           http://xxx:23188/proxy/application_1469834604094_0001/
..

Thanks,

On Thu, Jul 28, 2016 at 7:01 PM, Manoj Samel <manojsamelt...@gmail.com>
wrote:

> Hi Gour,
>
> I added properties in /etc/hadoop/conf/yarn-site.xml and emptied the
> /data/slider/conf/slider-client.xml and restarted both RMs.
>
>    - hadoop.registry.zk.quorum
>    - hadoop.registry.zk.root
>    - slider.yarn.queue
>
> Now there are no issues in creating or destroying cluster. This helps as
> it keeps all configs in one location - thanks for the update.
>
>  I am still hitting the original issue - Starting application with RM1
> active and then RM1 to RM2 fail over leads to slider-AM getting Client
> cannot authenticate via:[TOKEN] errors.
>
> I will upload the config files soon ...
>
> Thanks,
>
> On Thu, Jul 28, 2016 at 5:28 PM, Manoj Samel <manojsamelt...@gmail.com>
> wrote:
>
>> Thanks. I will test with the updated config and then upload the latest
>> ones ...
>>
>> Thanks,
>>
>> Manoj
>>
>> On Thu, Jul 28, 2016 at 5:21 PM, Gour Saha <gs...@hortonworks.com> wrote:
>>
>>> slider.zookeeper.quorum is deprecated and should not be used.
>>> hadoop.registry.zk.quorum is used instead and is typically defined in
>>> yarn-site.xml. So is hadoop.registry.zk.root.
>>>
>>> It is not encouraged to specify slider.yarn.queue at the cluster config
>>> level. Ideally it is best to specify the queue during the application
>>> submission. So you can use --queue option with slider create cmd. You can
>>> also set on the command line using -D slider.yarn.queue=<> during the
>>> create call. If indeed all slider apps should go to one and only one
>>> queue, then this prop can be specified in any one of the existing site
>>> xml
>>> files under /etc/hadoop/conf.
>>>
>>> -Gour
>>>
>>> On 7/28/16, 4:43 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>>
>>> >Following slider specific properties are at present added in
>>> >/data/slider/conf/slider-client.xml. If you think they should be picked
>>> up
>>> >from HADOOP_CONF_DIR (/etc/hadoop/conf) file, which file in
>>> >HADOOP_CONF_DIR
>>> >should these be added ?
>>> >
>>> >   - slider.zookeeper.quorum
>>> >   - hadoop.registry.zk.quorum
>>> >   - hadoop.registry.zk.root
>>> >   - slider.yarn.queue
>>> >
>>> >
>>> >On Thu, Jul 28, 2016 at 4:37 PM, Gour Saha <gs...@hortonworks.com>
>>> wrote:
>>> >
>>> >> That is strange, since it is indeed not required to contain anything
>>> in
>>> >> slider-client.xml (except <configuration></configuration>) if
>>> >> HADOOP_CONF_DIR has everything that Slider needs. This probably gives
>>> an
>>> >> indication that there might be some issue with cluster configuration
>>> >>based
>>> >> on files solely under HADOOP_CONF_DIR to begin with.
>>> >>
>>> >> Suggest you to upload all the config files to the jira to help debug
>>> >>this
>>> >> further.
>>> >>
>>> >> -Gour
>>> >>
>>> >> On 7/28/16, 4:27 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>> >>
>>> >> >Thanks Gour for prompt reply
>>> >> >
>>> >> >BTW - Creating a empty slider-client.xml (with just
>>> >> ><configuration></configuration>) does not works. The AM starts but
>>> >>fails
>>> >> >to
>>> >> >create any components and shows errors like
>>> >> >
>>> >> >2016-07-28 23:18:46,018
>>> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN
>>> >> > zookeeper.ClientCnxn - Session 0x0 for server null, unexpected
>>> error,
>>> >> >closing socket connection and attempting reconnect
>>> >> >java.net.ConnectException: Connection refused
>>> >> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> >> >        at
>>> >>
>>> >sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>> >> >        at
>>> >>
>>>
>>> >>>org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO
>>> >>>.j
>>> >> >ava:361)
>>> >> >        at
>>> >> >org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>> >> >
>>> >> >Also, command "slider destroy <app>" fails with zookeeper errors ...
>>> >> >
>>> >> >I had to keep a minimal slider-client.xml. It does not have any RM
>>> info
>>> >> >etc. but does contain slider ZK related properties like
>>> >> >"slider.zookeeper.quorum", "hadoop.registry.zk.quorum",
>>> >> >"hadoop.registry.zk.root". I haven't yet distilled the absolute
>>> minimal
>>> >> >set
>>> >> >of properties required, but this should suffice for now. All RM /
>>> HDFS
>>> >> >properties will be read from HADOOP_CONF_DIR files.
>>> >> >
>>> >> >Let me know if this could cause any issues.
>>> >> >
>>> >> >On Thu, Jul 28, 2016 at 3:36 PM, Gour Saha <gs...@hortonworks.com>
>>> >>wrote:
>>> >> >
>>> >> >> No need to copy any files. Pointing HADOOP_CONF_DIR to
>>> >>/etc/hadoop/conf
>>> >> >>is
>>> >> >> good.
>>> >> >>
>>> >> >> -Gour
>>> >> >>
>>> >> >> On 7/28/16, 3:24 PM, "Manoj Samel" <manojsamelt...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> >Follow up question regarding Gour's comment in earlier thread -
>>> >> >> >
>>> >> >> >Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf
>>> >> >>directory
>>> >> >> >(say /data/slider/conf) is different than HADOOP_CONF_DIR
>>> >> >> >(/etc/hadoop/conf). Is it required/recommended that files in
>>> >> >> >HADOOP_CONF_DIR be copied to SLIDER_HOME/conf and slider-env.sh
>>> >>script
>>> >> >> >sets
>>> >> >> >HADOOP_CONF_DIR to /data/slider/conf ?
>>> >> >> >
>>> >> >> >Or can the slider-env.sh set HADOOP_CONF_DIR to /etc/hadoop/conf ,
>>> >> >>without
>>> >> >> >copying the files ?
>>> >> >> >
>>> >> >> >Using slider .80 for now, but would like to know recommendation
>>> for
>>> >> >>this
>>> >> >> >and future versions as well.
>>> >> >> >
>>> >> >> >Thanks in advance,
>>> >> >> >
>>> >> >> >Manoj
>>> >> >> >
>>> >> >> >On Tue, Jul 26, 2016 at 3:27 PM, Manoj Samel
>>> >><manojsamelt...@gmail.com
>>> >> >
>>> >> >> >wrote:
>>> >> >> >
>>> >> >> >> Filed https://issues.apache.org/jira/browse/SLIDER-1158 with
>>> logs
>>> >> and
>>> >> >> my
>>> >> >> >> analysis of logs.
>>> >> >> >>
>>> >> >> >> On Tue, Jul 26, 2016 at 10:36 AM, Gour Saha
>>> >><gs...@hortonworks.com>
>>> >> >> >>wrote:
>>> >> >> >>
>>> >> >> >>> Please file a JIRA and upload the logs to it.
>>> >> >> >>>
>>> >> >> >>> On 7/26/16, 10:21 AM, "Manoj Samel" <manojsamelt...@gmail.com>
>>> >> >>wrote:
>>> >> >> >>>
>>> >> >> >>> >Hi Gour,
>>> >> >> >>> >
>>> >> >> >>> >Can you please reach me using your own email-id? I will then
>>> >>send
>>> >> >> >>>logs to
>>> >> >> >>> >you, along with my analysis - I don't want to send logs on
>>> >>public
>>> >> >>list
>>> >> >> >>> >
>>> >> >> >>> >Thanks,
>>> >> >> >>> >
>>> >> >> >>> >On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha
>>> >><gs...@hortonworks.com>
>>> >> >> >>> wrote:
>>> >> >> >>> >
>>> >> >> >>> >> Ok, so this node is not a gateway. It is part of the
>>> cluster,
>>> >> >>which
>>> >> >> >>> >>means
>>> >> >> >>> >> you don¹t need slider-client.xml at all. Just have
>>> >> >>HADOOP_CONF_DIR
>>> >> >> >>> >> pointing to /etc/hadoop/conf in slider-env.sh and that
>>> should
>>> >>be
>>> >> >>it.
>>> >> >> >>> >>
>>> >> >> >>> >> So the above simplifies your config setup. It will not solve
>>> >> >>either
>>> >> >> >>>of
>>> >> >> >>> >>the
>>> >> >> >>> >> 2 problems you are facing.
>>> >> >> >>> >>
>>> >> >> >>> >> Now coming to the 2 issues you are facing, you have to
>>> provide
>>> >> >> >>> >>additional
>>> >> >> >>> >> logs for us to understand better. Let¹s start with  -
>>> >> >> >>> >> 1. RM logs (specifically between the time when rm1->rm2
>>> >>failover
>>> >> >>is
>>> >> >> >>> >> simulated)
>>> >> >> >>> >> 2. Slider App logs
>>> >> >> >>> >>
>>> >> >> >>> >> -Gour
>>> >> >> >>> >>
>>> >> >> >>> >> On 7/25/16, 5:16 PM, "Manoj Samel" <
>>> manojsamelt...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>> >>
>>> >> >> >>> >> >   1. Not clear about your question on "gateway" node. The
>>> >>node
>>> >> >> >>> running
>>> >> >> >>> >> >   slider is part of the hadoop cluster and there are other
>>> >> >> >>>services
>>> >> >> >>> >>like
>>> >> >> >>> >> >   Oozie that run on this node that utilizes hdfs and yarn.
>>> >>So
>>> >> >>if
>>> >> >> >>>your
>>> >> >> >>> >> >   question is whether the node is otherwise working for
>>> HDFS
>>> >> >>and
>>> >> >> >>>Yarn
>>> >> >> >>> >> >   configuration, it is working
>>> >> >> >>> >> >   2. I copied all files from HADOOP_CONF_DIR (say
>>> >> >> >>>/etc/hadoop/conf)
>>> >> >> >>> to
>>> >> >> >>> >> >the
>>> >> >> >>> >> >   directory containing slider-client.xml (say
>>> >> >>/data/latest/conf)
>>> >> >> >>> >> >   3. In earlier email, I had done a mistake where
>>> >>slider-env.sh
>>> >> >> >>>file
>>> >> >> >>> >> >HADOOP_CONF_DIR
>>> >> >> >>> >> >   was pointing to original directory /etc/hadoop/conf. I
>>> >>edited
>>> >> >> >>>it to
>>> >> >> >>> >> >   point to same directory containing slider-client.xml &
>>> >> >> >>> slider-env.sh
>>> >> >> >>> >> >i.e.
>>> >> >> >>> >> >   /data/latest/conf
>>> >> >> >>> >> >   4. I emptied slider-client.xml. It just had the
>>> >> >> >>> >> ><configuration></configuration>.
>>> >> >> >>> >> >   The creation of spas worked but the Slider AM still
>>> shows
>>> >>the
>>> >> >> >>>same
>>> >> >> >>> >> >issue.
>>> >> >> >>> >> >   i.e. when RM1 goes from active to standby, slider AM
>>> goes
>>> >> >>from
>>> >> >> >>> >>RUNNING
>>> >> >> >>> >> >to
>>> >> >> >>> >> >   ACCPTED state with same error about TOKEN. Also NOTE
>>> that
>>> >> >>when
>>> >> >> >>> >> >   slider-client.xml is empty, the "slider destroy xxx"
>>> >>command
>>> >> >> >>>still
>>> >> >> >>> >> >fails
>>> >> >> >>> >> >   with Zookeeper connection errors.
>>> >> >> >>> >> >   5. I then added same parameters (as my last email -
>>> except
>>> >> >> >>> >> >   HADOOP_CONF_DIR) to slider-client.xml and ran. This time
>>> >> >> >>> >>slider-env.sh
>>> >> >> >>> >> >   has HADOOP_CONF_DIR pointing to /data/latest/conf and
>>> >> >> >>> >>slider-client.xml
>>> >> >> >>> >> >   does not have HADOOP_CONF_DIR. The same issue exists
>>> (but
>>> >> >> >>>"slider
>>> >> >> >>> >> >   destroy" does not fails)
>>> >> >> >>> >> >   6. Could you explain what do you expect to pick up from
>>> >> >>Hadoop
>>> >> >> >>> >> >   configurations that will help you in RM Token ? If
>>> slider
>>> >>has
>>> >> >> >>>token
>>> >> >> >>> >> >from
>>> >> >> >>> >> >   RM1, and it switches to RM2, not clear what slider does
>>> to
>>> >> >>get
>>> >> >> >>> >> >delegation
>>> >> >> >>> >> >   token for RM2 communication ?
>>> >> >> >>> >> >   7. It is worth repeating again that issue happens only
>>> >>when
>>> >> >>RM1
>>> >> >> >>>was
>>> >> >> >>> >> >   active when slider app was created and then RM1 becomes
>>> >> >> >>>standby. If
>>> >> >> >>> >> >RM2 was
>>> >> >> >>> >> >   active when slider app was created, then slider AM keeps
>>> >> >>running
>>> >> >> >>> for
>>> >> >> >>> >> >any
>>> >> >> >>> >> >   number of switches between RM2 and RM1 back and forth
>>> ...
>>> >> >> >>> >> >
>>> >> >> >>> >> >
>>> >> >> >>> >> >On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha
>>> >> >><gs...@hortonworks.com>
>>> >> >> >>> >>wrote:
>>> >> >> >>> >> >
>>> >> >> >>> >> >> The node you are running slider from, is that a gateway
>>> >>node?
>>> >> >> >>>Sorry
>>> >> >> >>> >>for
>>> >> >> >>> >> >> not being explicit. I meant copy everything under
>>> >> >> >>>/etc/hadoop/conf
>>> >> >> >>> >>from
>>> >> >> >>> >> >> your cluster into some temp directory (say
>>> >>/tmp/hadoop_conf)
>>> >> >>in
>>> >> >> >>>your
>>> >> >> >>> >> >> gateway node or local or whichever node you are running
>>> >>slider
>>> >> >> >>>from.
>>> >> >> >>> >> >>Then
>>> >> >> >>> >> >> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear
>>> >>everything
>>> >> >>out
>>> >> >> >>> from
>>> >> >> >>> >> >> slider-client.xml.
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> On 7/25/16, 4:12 PM, "Manoj Samel"
>>> >><manojsamelt...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> >Hi Gour,
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Thanks for your prompt reply.
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >FYI, issue happens when I create slider app when rm1 is
>>> >> >>active
>>> >> >> >>>and
>>> >> >> >>> >>when
>>> >> >> >>> >> >> >rm1
>>> >> >> >>> >> >> >fails over to rm2. As soon as rm2 becomes active; the
>>> >>slider
>>> >> >>AM
>>> >> >> >>> goes
>>> >> >> >>> >> >>from
>>> >> >> >>> >> >> >RUNNING to ACCEPTED state with above error.
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >For your suggestion, I did following
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >1) Copied core-site, hdfs-site, yarn-site, and
>>> mapred-site
>>> >> >>from
>>> >> >> >>> >> >> >HADOOP_CONF_DIR
>>> >> >> >>> >> >> >to slider conf directory.
>>> >> >> >>> >> >> >2) Our slider-env.sh already had HADOOP_CONF_DIR set
>>> >> >> >>> >> >> >3) I removed all properties from slider-client.xml
>>> EXCEPT
>>> >> >> >>>following
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >   - HADOOP_CONF_DIR
>>> >> >> >>> >> >> >   - slider.yarn.queue
>>> >> >> >>> >> >> >   - slider.zookeeper.quorum
>>> >> >> >>> >> >> >   - hadoop.registry.zk.quorum
>>> >> >> >>> >> >> >   - hadoop.registry.zk.root
>>> >> >> >>> >> >> >   - hadoop.security.authorization
>>> >> >> >>> >> >> >   - hadoop.security.authentication
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Then I made rm1 active, installed and created slider app
>>> >>and
>>> >> >> >>> >>restarted
>>> >> >> >>> >> >>rm1
>>> >> >> >>> >> >> >(to make rm2) active. The slider-am again went from
>>> >>RUNNING
>>> >> >>to
>>> >> >> >>> >>ACCEPTED
>>> >> >> >>> >> >> >state.
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Let me know if you want me to try further changes.
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >If I make the slider-client.xml completely empty per
>>> your
>>> >> >> >>> >>suggestion,
>>> >> >> >>> >> >>only
>>> >> >> >>> >> >> >slider AM comes up but it
>>> >> >> >>> >> >> >fails to start components. The AM log shows errors
>>> trying
>>> >>to
>>> >> >> >>> >>connect to
>>> >> >> >>> >> >> >zookeeper like below.
>>> >> >> >>> >> >> >2016-07-25 23:07:41,532
>>> >> >> >>> >> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)]
>>> >>WARN
>>> >> >> >>> >> >> >zookeeper.ClientCnxn - Session 0x0 for server null,
>>> >> >>unexpected
>>> >> >> >>> >>error,
>>> >> >> >>> >> >> >closing socket connection and attempting reconnect
>>> >> >> >>> >> >> >java.net.ConnectException: Connection refused
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Hence I kept minimal info in slider-client.xml
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >FYI This is slider version 0.80
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Thanks,
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >Manoj
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha
>>> >> >> >>><gs...@hortonworks.com>
>>> >> >> >>> >> >>wrote:
>>> >> >> >>> >> >> >
>>> >> >> >>> >> >> >> If possible, can you copy the entire content of the
>>> >> >>directory
>>> >> >> >>> >> >> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in
>>> >> >> >>>slider-env.sh to
>>> >> >> >>> >>it.
>>> >> >> >>> >> >> >>Keep
>>> >> >> >>> >> >> >> slider-client.xml empty.
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >> >> Now when you do the same rm1->rm2 and then the reverse
>>> >> >> >>>failovers,
>>> >> >> >>> >>do
>>> >> >> >>> >> >>you
>>> >> >> >>> >> >> >> see the same behaviors?
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >> >> -Gour
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >> >> On 7/25/16, 2:28 PM, "Manoj Samel"
>>> >> >><manojsamelt...@gmail.com>
>>> >> >> >>> >>wrote:
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >> >> >Another observation (whatever it is worth)
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >If slider app is created and started when rm2 was
>>> >>active,
>>> >> >> >>>then
>>> >> >> >>> it
>>> >> >> >>> >> >> >>seems to
>>> >> >> >>> >> >> >> >survive switches between rm2 and rm1 (and back). I.e
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >* rm2 is active
>>> >> >> >>> >> >> >> >* create and start slider application
>>> >> >> >>> >> >> >> >* fail over to rm1. Now the Slider AM keeps running
>>> >> >> >>> >> >> >> >* fail over to rm2 again. Slider AM still keeps
>>> running
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >So, it seems if it starts with rm1 active, then the
>>> AM
>>> >> >>goes
>>> >> >> >>>to
>>> >> >> >>> >> >> >>"ACCEPTED"
>>> >> >> >>> >> >> >> >state when RM fails to rm2. If it starts with rm2
>>> >>active,
>>> >> >> >>>then
>>> >> >> >>> it
>>> >> >> >>> >> >>runs
>>> >> >> >>> >> >> >> >fine
>>> >> >> >>> >> >> >> >with any switches between rm1 and rm2.
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >Any feedback ?
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >Thanks,
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >Manoj
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel
>>> >> >> >>> >> >> >><manojsamelt...@gmail.com>
>>> >> >> >>> >> >> >> >wrote:
>>> >> >> >>> >> >> >> >
>>> >> >> >>> >> >> >> >> Setup
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled
>>> >> >> >>> >> >> >> >> - Slider 0.80
>>> >> >> >>> >> >> >> >> - In my slider-client.xml, I have added all RM HA
>>> >> >> >>>properties,
>>> >> >> >>> >> >> >>including
>>> >> >> >>> >> >> >> >> the ones mentioned in
>>> >> >> >>> >> >>http://markmail.org/message/wnhpp2zn6ixo65e3.
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> Following is the issue
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> * rm1 is active, rm2 is standby
>>> >> >> >>> >> >> >> >> * deploy and start slider application, it runs fine
>>> >> >> >>> >> >> >> >> * restart rm1, rm2 is now active.
>>> >> >> >>> >> >> >> >> * The slider-am now goes from running into
>>> "ACCEPTED"
>>> >> >> >>>mode. It
>>> >> >> >>> >> >>stays
>>> >> >> >>> >> >> >> >>there
>>> >> >> >>> >> >> >> >> till rm1 is made active again.
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> In the slider-am log, it tries to connect to RM2
>>> and
>>> >> >> >>> connection
>>> >> >> >>> >> >>fails
>>> >> >> >>> >> >> >> >>due
>>> >> >> >>> >> >> >> >> to
>>> org.apache.hadoop.security.AccessControlException:
>>> >> >> >>>Client
>>> >> >> >>> >> >>cannot
>>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]. See detailed log below
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >>  It seems it has some token (delegation token?) for
>>> >>RM1
>>> >> >>but
>>> >> >> >>> >>tries
>>> >> >> >>> >> >>to
>>> >> >> >>> >> >> >>use
>>> >> >> >>> >> >> >> >> same(?) for RM2 and fails. Am I missing some
>>> >> >>configuration
>>> >> >> >>>???
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> Thanks,
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread]
>>> >>INFO
>>> >> >> >>> >> >> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing
>>> >> >>over to
>>> >> >> >>> rm2
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread]
>>> >>WARN
>>> >> >> >>> >> >> >> >>  security.UserGroupInformation -
>>> >> >>PriviledgedActionException
>>> >> >> >>> >> >> >>as:abc@XYZ
>>> >> >> >>> >> >> >> >> (auth:KERBEROS)
>>> >> >> >>> >> >>
>>> >>cause:org.apache.hadoop.security.AccessControlException:
>>> >> >> >>> >> >> >> >> Client cannot authenticate via:[TOKEN]
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread]
>>> >>WARN
>>> >> >> >>> >> >>ipc.Client -
>>> >> >> >>> >> >> >> >> Exception encountered while connecting to the
>>> server
>>> >>:
>>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException:
>>> >> >>Client
>>> >> >> >>> >>cannot
>>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread]
>>> >>WARN
>>> >> >> >>> >> >> >> >>  security.UserGroupInformation -
>>> >> >>PriviledgedActionException
>>> >> >> >>> >> >> >>as:abc@XYZ
>>> >> >> >>> >> >> >> >> (auth:KERBEROS) cause:java.io.IOException:
>>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException:
>>> >> >>Client
>>> >> >> >>> >>cannot
>>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread]
>>> >>INFO
>>> >> >> >>> >> >> >> >>  retry.RetryInvocationHandler - Exception while
>>> >>invoking
>>> >> >> >>> >>allocate
>>> >> >> >>> >> >>of
>>> >> >> >>> >> >> >> >>class
>>> >> >> >>> >> >> >> >> ApplicationMasterProtocolPBClientImpl over rm2
>>> after
>>> >>287
>>> >> >> >>>fail
>>> >> >> >>> >>over
>>> >> >> >>> >> >> >> >> attempts. Trying to fail over immediately.
>>> >> >> >>> >> >> >> >> java.io.IOException: Failed on local exception:
>>> >> >> >>> >> >>java.io.IOException:
>>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException:
>>> >> >>Client
>>> >> >> >>> >>cannot
>>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]; Host Details : local host
>>> >>is:
>>> >> >> >>> >>"<SliderAM
>>> >> >> >>> >> >> >> >> HOST>/<slider AM Host IP>"; destination host is:
>>> >>"<RM2
>>> >> >> >>> >> >>HOST>":23130;
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >>
>>> >> >>
>>> >>>>>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1476)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1403)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(Proto
>>> >>>>>>>>>>>>>>>bu
>>> >> >>>>>>>>>>>>>fR
>>> >> >> >>>>>>>>>>>pcE
>>> >> >> >>> >>>>>>>>ng
>>> >> >> >>> >> >>>>>>in
>>> >> >> >>> >> >> >>>>e.
>>> >> >> >>> >> >> >> >>java:230)
>>> >> >> >>> >> >> >> >>         at com.sun.proxy.$Proxy23.allocate(Unknown
>>> >> >>Source)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterP
>>> >>>>>>>>>>>>>>>ro
>>> >> >>>>>>>>>>>>>to
>>> >> >> >>>>>>>>>>>col
>>> >> >> >>> >>>>>>>>PB
>>> >> >> >>> >> >>>>>>Cl
>>> >> >> >>> >> >> >>>>ie
>>> >> >> >>> >> >> >>
>>> >> >> >>>>>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>>> >> >> >>> >> >> >>Source)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> >>>>>>>>>>>>>>>th
>>> >> >>>>>>>>>>>>>od
>>> >> >> >>>>>>>>>>>Acc
>>> >> >> >>> >>>>>>>>es
>>> >> >> >>> >> >>>>>>so
>>> >> >> >>> >> >> >>>>rI
>>> >> >> >>> >> >> >> >>mpl.java:43)
>>> >> >> >>> >> >> >> >>         at
>>> >> >>java.lang.reflect.Method.invoke(Method.java:497)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth
>>> >>>>>>>>>>>>>>>od
>>> >> >>>>>>>>>>>>>(R
>>> >> >> >>>>>>>>>>>etr
>>> >> >> >>> >>>>>>>>yI
>>> >> >> >>> >> >>>>>>nv
>>> >> >> >>> >> >> >>>>oc
>>> >> >> >>> >> >> >> >>ationHandler.java:252)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret
>>> >>>>>>>>>>>>>>>ry
>>> >> >>>>>>>>>>>>>In
>>> >> >> >>>>>>>>>>>voc
>>> >> >> >>> >>>>>>>>at
>>> >> >> >>> >> >>>>>>io
>>> >> >> >>> >> >> >>>>nH
>>> >> >> >>> >> >> >> >>andler.java:104)
>>> >> >> >>> >> >> >> >>         at com.sun.proxy.$Proxy24.allocate(Unknown
>>> >> >>Source)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.alloca
>>> >>>>>>>>>>>>>>>te
>>> >> >>>>>>>>>>>>>(A
>>> >> >> >>>>>>>>>>>MRM
>>> >> >> >>> >>>>>>>>Cl
>>> >> >> >>> >> >>>>>>ie
>>> >> >> >>> >> >> >>>>nt
>>> >> >> >>> >> >> >> >>Impl.java:278)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsync
>>> >>>>>>>>>>>>>>>Im
>>> >> >>>>>>>>>>>>>pl
>>> >> >> >>>>>>>>>>>$He
>>> >> >> >>> >>>>>>>>ar
>>> >> >> >>> >> >>>>>>tb
>>> >> >> >>> >> >> >>>>ea
>>> >> >> >>> >> >> >> >>tThread.run(AMRMClientAsyncImpl.java:224)
>>> >> >> >>> >> >> >> >> Caused by: java.io.IOException:
>>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException:
>>> >> >>Client
>>> >> >> >>> >>cannot
>>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >>
>>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>java.security.AccessController.doPrivileged(Native
>>> >> >> >>> >> >>Method)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>javax.security.auth.Subject.doAs(Subject.java:422)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>> >>>>>>>>>>>>>>>up
>>> >> >>>>>>>>>>>>>In
>>> >> >> >>>>>>>>>>>for
>>> >> >> >>> >>>>>>>>ma
>>> >> >> >>> >> >>>>>>ti
>>> >> >> >>> >> >> >>>>on
>>> >> >> >>> >> >> >> >>.java:1671)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnection
>>> >>>>>>>>>>>>>>>Fa
>>> >> >>>>>>>>>>>>>il
>>> >> >> >>>>>>>>>>>ure
>>> >> >> >>> >>>>>>>>(C
>>> >> >> >>> >> >>>>>>li
>>> >> >> >>> >> >> >>>>en
>>> >> >> >>> >> >> >> >>t.java:645)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.
>>> >>>>>>>>>>>>>ja
>>> >> >>>>>>>>>>>va
>>> >> >> >>>>>>>>>:73
>>> >> >> >>> >>>>>>3)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:37
>>> >>>>>>>>>0)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >>
>>> >>>>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1442)
>>> >> >> >>> >> >> >> >>         ... 12 more
>>> >> >> >>> >> >> >> >> Caused by:
>>> >> >> >>>org.apache.hadoop.security.AccessControlException:
>>> >> >> >>> >> >>Client
>>> >> >> >>> >> >> >> >> cannot authenticate via:[TOKEN]
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(Sa
>>> >>>>>>>>>>>>>>>sl
>>> >> >>>>>>>>>>>>>Rp
>>> >> >> >>>>>>>>>>>cCl
>>> >> >> >>> >>>>>>>>ie
>>> >> >> >>> >> >>>>>>nt
>>> >> >> >>> >> >> >>>>.j
>>> >> >> >>> >> >> >> >>ava:172)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpc
>>> >>>>>>>>>>>>>>>Cl
>>> >> >>>>>>>>>>>>>ie
>>> >> >> >>>>>>>>>>>nt.
>>> >> >> >>> >>>>>>>>ja
>>> >> >> >>> >> >>>>>>va
>>> >> >> >>> >> >> >>>>:3
>>> >> >> >>> >> >> >> >>96)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(
>>> >>>>>>>>>>>>>>>Cl
>>> >> >>>>>>>>>>>>>ie
>>> >> >> >>>>>>>>>>>nt.
>>> >> >> >>> >>>>>>>>ja
>>> >> >> >>> >> >>>>>>va
>>> >> >> >>> >> >> >>>>:5
>>> >> >> >>> >> >> >> >>55)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:37
>>> >>>>>>>>>0)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >>
>>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >>
>>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>java.security.AccessController.doPrivileged(Native
>>> >> >> >>> >> >>Method)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>>javax.security.auth.Subject.doAs(Subject.java:422)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>> >>>>>>>>>>>>>>>up
>>> >> >>>>>>>>>>>>>In
>>> >> >> >>>>>>>>>>>for
>>> >> >> >>> >>>>>>>>ma
>>> >> >> >>> >> >>>>>>ti
>>> >> >> >>> >> >> >>>>on
>>> >> >> >>> >> >> >> >>.java:1671)
>>> >> >> >>> >> >> >> >>         at
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>>
>>> >>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.
>>> >>>>>>>>>>>>>ja
>>> >> >>>>>>>>>>>va
>>> >> >> >>>>>>>>>:72
>>> >> >> >>> >>>>>>0)
>>> >> >> >>> >> >> >> >>         ... 15 more
>>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread]
>>> >>INFO
>>> >> >> >>> >> >> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing
>>> >> >>over to
>>> >> >> >>> rm1
>>> >> >> >>> >> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>
>

Reply via email to