Hi, I have uploaded the config files, hope these shed light into the TICKET authentication issue.
As a side note - it seems the commands like "slider list <app> --containers" etc. now are ** significantly ** slower (compared when slider-client.xml was not empty and had few properties). The commands sometime take 1 minute on same cluster where it used to take few seconds before. Also, it seems the first command executed after a some inactivity takes long time to execute, command repeated immediately returns quickly. Same is observed when slider AM restarts (e.g. due to upgrade). This slowness was not present when slider-client.xml had config parameters like registry zookeepers and RM address. Why would there be such a difference for first execution when all config is read from HADOOP_CONF_DIR files Following is a output of "slider list <xxx> --containers" executed twice. Note first one took almost a minute, second run was almost instantaneous [root@... ~]# slider list foo --containers 2016-07-29 23:30:35,197 [main] INFO tools.SliderUtils - JVM initialized into secure mode with kerberos realm xxx 2016-07-29 23:31:22,035 [main] INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 2016-07-29 23:31:22,162 [main] INFO util.ExitUtil - Exiting with status 0 foo RUNNING application_1469834604094_0001 http://xxx:23188/proxy/application_1469834604094_0001/ ...... [root@... ~]# slider list foo --containers 2016-07-29 23:32:34,816 [main] INFO tools.SliderUtils - JVM initialized into secure mode with kerberos realm xxx 2016-07-29 23:32:35,775 [main] INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 2016-07-29 23:32:35,896 [main] INFO util.ExitUtil - Exiting with status 0 foo RUNNING application_1469834604094_0001 http://xxx:23188/proxy/application_1469834604094_0001/ .. Thanks, On Thu, Jul 28, 2016 at 7:01 PM, Manoj Samel <manojsamelt...@gmail.com> wrote: > Hi Gour, > > I added properties in /etc/hadoop/conf/yarn-site.xml and emptied the > /data/slider/conf/slider-client.xml and restarted both RMs. > > - hadoop.registry.zk.quorum > - hadoop.registry.zk.root > - slider.yarn.queue > > Now there are no issues in creating or destroying cluster. This helps as > it keeps all configs in one location - thanks for the update. > > I am still hitting the original issue - Starting application with RM1 > active and then RM1 to RM2 fail over leads to slider-AM getting Client > cannot authenticate via:[TOKEN] errors. > > I will upload the config files soon ... > > Thanks, > > On Thu, Jul 28, 2016 at 5:28 PM, Manoj Samel <manojsamelt...@gmail.com> > wrote: > >> Thanks. I will test with the updated config and then upload the latest >> ones ... >> >> Thanks, >> >> Manoj >> >> On Thu, Jul 28, 2016 at 5:21 PM, Gour Saha <gs...@hortonworks.com> wrote: >> >>> slider.zookeeper.quorum is deprecated and should not be used. >>> hadoop.registry.zk.quorum is used instead and is typically defined in >>> yarn-site.xml. So is hadoop.registry.zk.root. >>> >>> It is not encouraged to specify slider.yarn.queue at the cluster config >>> level. Ideally it is best to specify the queue during the application >>> submission. So you can use --queue option with slider create cmd. You can >>> also set on the command line using -D slider.yarn.queue=<> during the >>> create call. If indeed all slider apps should go to one and only one >>> queue, then this prop can be specified in any one of the existing site >>> xml >>> files under /etc/hadoop/conf. >>> >>> -Gour >>> >>> On 7/28/16, 4:43 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >>> >>> >Following slider specific properties are at present added in >>> >/data/slider/conf/slider-client.xml. If you think they should be picked >>> up >>> >from HADOOP_CONF_DIR (/etc/hadoop/conf) file, which file in >>> >HADOOP_CONF_DIR >>> >should these be added ? >>> > >>> > - slider.zookeeper.quorum >>> > - hadoop.registry.zk.quorum >>> > - hadoop.registry.zk.root >>> > - slider.yarn.queue >>> > >>> > >>> >On Thu, Jul 28, 2016 at 4:37 PM, Gour Saha <gs...@hortonworks.com> >>> wrote: >>> > >>> >> That is strange, since it is indeed not required to contain anything >>> in >>> >> slider-client.xml (except <configuration></configuration>) if >>> >> HADOOP_CONF_DIR has everything that Slider needs. This probably gives >>> an >>> >> indication that there might be some issue with cluster configuration >>> >>based >>> >> on files solely under HADOOP_CONF_DIR to begin with. >>> >> >>> >> Suggest you to upload all the config files to the jira to help debug >>> >>this >>> >> further. >>> >> >>> >> -Gour >>> >> >>> >> On 7/28/16, 4:27 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >>> >> >>> >> >Thanks Gour for prompt reply >>> >> > >>> >> >BTW - Creating a empty slider-client.xml (with just >>> >> ><configuration></configuration>) does not works. The AM starts but >>> >>fails >>> >> >to >>> >> >create any components and shows errors like >>> >> > >>> >> >2016-07-28 23:18:46,018 >>> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN >>> >> > zookeeper.ClientCnxn - Session 0x0 for server null, unexpected >>> error, >>> >> >closing socket connection and attempting reconnect >>> >> >java.net.ConnectException: Connection refused >>> >> > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>> >> > at >>> >> >>> >sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>> >> > at >>> >> >>> >>> >>>org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO >>> >>>.j >>> >> >ava:361) >>> >> > at >>> >> >org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) >>> >> > >>> >> >Also, command "slider destroy <app>" fails with zookeeper errors ... >>> >> > >>> >> >I had to keep a minimal slider-client.xml. It does not have any RM >>> info >>> >> >etc. but does contain slider ZK related properties like >>> >> >"slider.zookeeper.quorum", "hadoop.registry.zk.quorum", >>> >> >"hadoop.registry.zk.root". I haven't yet distilled the absolute >>> minimal >>> >> >set >>> >> >of properties required, but this should suffice for now. All RM / >>> HDFS >>> >> >properties will be read from HADOOP_CONF_DIR files. >>> >> > >>> >> >Let me know if this could cause any issues. >>> >> > >>> >> >On Thu, Jul 28, 2016 at 3:36 PM, Gour Saha <gs...@hortonworks.com> >>> >>wrote: >>> >> > >>> >> >> No need to copy any files. Pointing HADOOP_CONF_DIR to >>> >>/etc/hadoop/conf >>> >> >>is >>> >> >> good. >>> >> >> >>> >> >> -Gour >>> >> >> >>> >> >> On 7/28/16, 3:24 PM, "Manoj Samel" <manojsamelt...@gmail.com> >>> wrote: >>> >> >> >>> >> >> >Follow up question regarding Gour's comment in earlier thread - >>> >> >> > >>> >> >> >Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf >>> >> >>directory >>> >> >> >(say /data/slider/conf) is different than HADOOP_CONF_DIR >>> >> >> >(/etc/hadoop/conf). Is it required/recommended that files in >>> >> >> >HADOOP_CONF_DIR be copied to SLIDER_HOME/conf and slider-env.sh >>> >>script >>> >> >> >sets >>> >> >> >HADOOP_CONF_DIR to /data/slider/conf ? >>> >> >> > >>> >> >> >Or can the slider-env.sh set HADOOP_CONF_DIR to /etc/hadoop/conf , >>> >> >>without >>> >> >> >copying the files ? >>> >> >> > >>> >> >> >Using slider .80 for now, but would like to know recommendation >>> for >>> >> >>this >>> >> >> >and future versions as well. >>> >> >> > >>> >> >> >Thanks in advance, >>> >> >> > >>> >> >> >Manoj >>> >> >> > >>> >> >> >On Tue, Jul 26, 2016 at 3:27 PM, Manoj Samel >>> >><manojsamelt...@gmail.com >>> >> > >>> >> >> >wrote: >>> >> >> > >>> >> >> >> Filed https://issues.apache.org/jira/browse/SLIDER-1158 with >>> logs >>> >> and >>> >> >> my >>> >> >> >> analysis of logs. >>> >> >> >> >>> >> >> >> On Tue, Jul 26, 2016 at 10:36 AM, Gour Saha >>> >><gs...@hortonworks.com> >>> >> >> >>wrote: >>> >> >> >> >>> >> >> >>> Please file a JIRA and upload the logs to it. >>> >> >> >>> >>> >> >> >>> On 7/26/16, 10:21 AM, "Manoj Samel" <manojsamelt...@gmail.com> >>> >> >>wrote: >>> >> >> >>> >>> >> >> >>> >Hi Gour, >>> >> >> >>> > >>> >> >> >>> >Can you please reach me using your own email-id? I will then >>> >>send >>> >> >> >>>logs to >>> >> >> >>> >you, along with my analysis - I don't want to send logs on >>> >>public >>> >> >>list >>> >> >> >>> > >>> >> >> >>> >Thanks, >>> >> >> >>> > >>> >> >> >>> >On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha >>> >><gs...@hortonworks.com> >>> >> >> >>> wrote: >>> >> >> >>> > >>> >> >> >>> >> Ok, so this node is not a gateway. It is part of the >>> cluster, >>> >> >>which >>> >> >> >>> >>means >>> >> >> >>> >> you don¹t need slider-client.xml at all. Just have >>> >> >>HADOOP_CONF_DIR >>> >> >> >>> >> pointing to /etc/hadoop/conf in slider-env.sh and that >>> should >>> >>be >>> >> >>it. >>> >> >> >>> >> >>> >> >> >>> >> So the above simplifies your config setup. It will not solve >>> >> >>either >>> >> >> >>>of >>> >> >> >>> >>the >>> >> >> >>> >> 2 problems you are facing. >>> >> >> >>> >> >>> >> >> >>> >> Now coming to the 2 issues you are facing, you have to >>> provide >>> >> >> >>> >>additional >>> >> >> >>> >> logs for us to understand better. Let¹s start with - >>> >> >> >>> >> 1. RM logs (specifically between the time when rm1->rm2 >>> >>failover >>> >> >>is >>> >> >> >>> >> simulated) >>> >> >> >>> >> 2. Slider App logs >>> >> >> >>> >> >>> >> >> >>> >> -Gour >>> >> >> >>> >> >>> >> >> >>> >> On 7/25/16, 5:16 PM, "Manoj Samel" < >>> manojsamelt...@gmail.com> >>> >> >> wrote: >>> >> >> >>> >> >>> >> >> >>> >> > 1. Not clear about your question on "gateway" node. The >>> >>node >>> >> >> >>> running >>> >> >> >>> >> > slider is part of the hadoop cluster and there are other >>> >> >> >>>services >>> >> >> >>> >>like >>> >> >> >>> >> > Oozie that run on this node that utilizes hdfs and yarn. >>> >>So >>> >> >>if >>> >> >> >>>your >>> >> >> >>> >> > question is whether the node is otherwise working for >>> HDFS >>> >> >>and >>> >> >> >>>Yarn >>> >> >> >>> >> > configuration, it is working >>> >> >> >>> >> > 2. I copied all files from HADOOP_CONF_DIR (say >>> >> >> >>>/etc/hadoop/conf) >>> >> >> >>> to >>> >> >> >>> >> >the >>> >> >> >>> >> > directory containing slider-client.xml (say >>> >> >>/data/latest/conf) >>> >> >> >>> >> > 3. In earlier email, I had done a mistake where >>> >>slider-env.sh >>> >> >> >>>file >>> >> >> >>> >> >HADOOP_CONF_DIR >>> >> >> >>> >> > was pointing to original directory /etc/hadoop/conf. I >>> >>edited >>> >> >> >>>it to >>> >> >> >>> >> > point to same directory containing slider-client.xml & >>> >> >> >>> slider-env.sh >>> >> >> >>> >> >i.e. >>> >> >> >>> >> > /data/latest/conf >>> >> >> >>> >> > 4. I emptied slider-client.xml. It just had the >>> >> >> >>> >> ><configuration></configuration>. >>> >> >> >>> >> > The creation of spas worked but the Slider AM still >>> shows >>> >>the >>> >> >> >>>same >>> >> >> >>> >> >issue. >>> >> >> >>> >> > i.e. when RM1 goes from active to standby, slider AM >>> goes >>> >> >>from >>> >> >> >>> >>RUNNING >>> >> >> >>> >> >to >>> >> >> >>> >> > ACCPTED state with same error about TOKEN. Also NOTE >>> that >>> >> >>when >>> >> >> >>> >> > slider-client.xml is empty, the "slider destroy xxx" >>> >>command >>> >> >> >>>still >>> >> >> >>> >> >fails >>> >> >> >>> >> > with Zookeeper connection errors. >>> >> >> >>> >> > 5. I then added same parameters (as my last email - >>> except >>> >> >> >>> >> > HADOOP_CONF_DIR) to slider-client.xml and ran. This time >>> >> >> >>> >>slider-env.sh >>> >> >> >>> >> > has HADOOP_CONF_DIR pointing to /data/latest/conf and >>> >> >> >>> >>slider-client.xml >>> >> >> >>> >> > does not have HADOOP_CONF_DIR. The same issue exists >>> (but >>> >> >> >>>"slider >>> >> >> >>> >> > destroy" does not fails) >>> >> >> >>> >> > 6. Could you explain what do you expect to pick up from >>> >> >>Hadoop >>> >> >> >>> >> > configurations that will help you in RM Token ? If >>> slider >>> >>has >>> >> >> >>>token >>> >> >> >>> >> >from >>> >> >> >>> >> > RM1, and it switches to RM2, not clear what slider does >>> to >>> >> >>get >>> >> >> >>> >> >delegation >>> >> >> >>> >> > token for RM2 communication ? >>> >> >> >>> >> > 7. It is worth repeating again that issue happens only >>> >>when >>> >> >>RM1 >>> >> >> >>>was >>> >> >> >>> >> > active when slider app was created and then RM1 becomes >>> >> >> >>>standby. If >>> >> >> >>> >> >RM2 was >>> >> >> >>> >> > active when slider app was created, then slider AM keeps >>> >> >>running >>> >> >> >>> for >>> >> >> >>> >> >any >>> >> >> >>> >> > number of switches between RM2 and RM1 back and forth >>> ... >>> >> >> >>> >> > >>> >> >> >>> >> > >>> >> >> >>> >> >On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha >>> >> >><gs...@hortonworks.com> >>> >> >> >>> >>wrote: >>> >> >> >>> >> > >>> >> >> >>> >> >> The node you are running slider from, is that a gateway >>> >>node? >>> >> >> >>>Sorry >>> >> >> >>> >>for >>> >> >> >>> >> >> not being explicit. I meant copy everything under >>> >> >> >>>/etc/hadoop/conf >>> >> >> >>> >>from >>> >> >> >>> >> >> your cluster into some temp directory (say >>> >>/tmp/hadoop_conf) >>> >> >>in >>> >> >> >>>your >>> >> >> >>> >> >> gateway node or local or whichever node you are running >>> >>slider >>> >> >> >>>from. >>> >> >> >>> >> >>Then >>> >> >> >>> >> >> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear >>> >>everything >>> >> >>out >>> >> >> >>> from >>> >> >> >>> >> >> slider-client.xml. >>> >> >> >>> >> >> >>> >> >> >>> >> >> On 7/25/16, 4:12 PM, "Manoj Samel" >>> >><manojsamelt...@gmail.com> >>> >> >> >>> wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> >Hi Gour, >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Thanks for your prompt reply. >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >FYI, issue happens when I create slider app when rm1 is >>> >> >>active >>> >> >> >>>and >>> >> >> >>> >>when >>> >> >> >>> >> >> >rm1 >>> >> >> >>> >> >> >fails over to rm2. As soon as rm2 becomes active; the >>> >>slider >>> >> >>AM >>> >> >> >>> goes >>> >> >> >>> >> >>from >>> >> >> >>> >> >> >RUNNING to ACCEPTED state with above error. >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >For your suggestion, I did following >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >1) Copied core-site, hdfs-site, yarn-site, and >>> mapred-site >>> >> >>from >>> >> >> >>> >> >> >HADOOP_CONF_DIR >>> >> >> >>> >> >> >to slider conf directory. >>> >> >> >>> >> >> >2) Our slider-env.sh already had HADOOP_CONF_DIR set >>> >> >> >>> >> >> >3) I removed all properties from slider-client.xml >>> EXCEPT >>> >> >> >>>following >>> >> >> >>> >> >> > >>> >> >> >>> >> >> > - HADOOP_CONF_DIR >>> >> >> >>> >> >> > - slider.yarn.queue >>> >> >> >>> >> >> > - slider.zookeeper.quorum >>> >> >> >>> >> >> > - hadoop.registry.zk.quorum >>> >> >> >>> >> >> > - hadoop.registry.zk.root >>> >> >> >>> >> >> > - hadoop.security.authorization >>> >> >> >>> >> >> > - hadoop.security.authentication >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Then I made rm1 active, installed and created slider app >>> >>and >>> >> >> >>> >>restarted >>> >> >> >>> >> >>rm1 >>> >> >> >>> >> >> >(to make rm2) active. The slider-am again went from >>> >>RUNNING >>> >> >>to >>> >> >> >>> >>ACCEPTED >>> >> >> >>> >> >> >state. >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Let me know if you want me to try further changes. >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >If I make the slider-client.xml completely empty per >>> your >>> >> >> >>> >>suggestion, >>> >> >> >>> >> >>only >>> >> >> >>> >> >> >slider AM comes up but it >>> >> >> >>> >> >> >fails to start components. The AM log shows errors >>> trying >>> >>to >>> >> >> >>> >>connect to >>> >> >> >>> >> >> >zookeeper like below. >>> >> >> >>> >> >> >2016-07-25 23:07:41,532 >>> >> >> >>> >> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] >>> >>WARN >>> >> >> >>> >> >> >zookeeper.ClientCnxn - Session 0x0 for server null, >>> >> >>unexpected >>> >> >> >>> >>error, >>> >> >> >>> >> >> >closing socket connection and attempting reconnect >>> >> >> >>> >> >> >java.net.ConnectException: Connection refused >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Hence I kept minimal info in slider-client.xml >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >FYI This is slider version 0.80 >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Thanks, >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >Manoj >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha >>> >> >> >>><gs...@hortonworks.com> >>> >> >> >>> >> >>wrote: >>> >> >> >>> >> >> > >>> >> >> >>> >> >> >> If possible, can you copy the entire content of the >>> >> >>directory >>> >> >> >>> >> >> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in >>> >> >> >>>slider-env.sh to >>> >> >> >>> >>it. >>> >> >> >>> >> >> >>Keep >>> >> >> >>> >> >> >> slider-client.xml empty. >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> Now when you do the same rm1->rm2 and then the reverse >>> >> >> >>>failovers, >>> >> >> >>> >>do >>> >> >> >>> >> >>you >>> >> >> >>> >> >> >> see the same behaviors? >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> -Gour >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> On 7/25/16, 2:28 PM, "Manoj Samel" >>> >> >><manojsamelt...@gmail.com> >>> >> >> >>> >>wrote: >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> >Another observation (whatever it is worth) >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >If slider app is created and started when rm2 was >>> >>active, >>> >> >> >>>then >>> >> >> >>> it >>> >> >> >>> >> >> >>seems to >>> >> >> >>> >> >> >> >survive switches between rm2 and rm1 (and back). I.e >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >* rm2 is active >>> >> >> >>> >> >> >> >* create and start slider application >>> >> >> >>> >> >> >> >* fail over to rm1. Now the Slider AM keeps running >>> >> >> >>> >> >> >> >* fail over to rm2 again. Slider AM still keeps >>> running >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >So, it seems if it starts with rm1 active, then the >>> AM >>> >> >>goes >>> >> >> >>>to >>> >> >> >>> >> >> >>"ACCEPTED" >>> >> >> >>> >> >> >> >state when RM fails to rm2. If it starts with rm2 >>> >>active, >>> >> >> >>>then >>> >> >> >>> it >>> >> >> >>> >> >>runs >>> >> >> >>> >> >> >> >fine >>> >> >> >>> >> >> >> >with any switches between rm1 and rm2. >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >Any feedback ? >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >Thanks, >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >Manoj >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel >>> >> >> >>> >> >> >><manojsamelt...@gmail.com> >>> >> >> >>> >> >> >> >wrote: >>> >> >> >>> >> >> >> > >>> >> >> >>> >> >> >> >> Setup >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled >>> >> >> >>> >> >> >> >> - Slider 0.80 >>> >> >> >>> >> >> >> >> - In my slider-client.xml, I have added all RM HA >>> >> >> >>>properties, >>> >> >> >>> >> >> >>including >>> >> >> >>> >> >> >> >> the ones mentioned in >>> >> >> >>> >> >>http://markmail.org/message/wnhpp2zn6ixo65e3. >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> Following is the issue >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> * rm1 is active, rm2 is standby >>> >> >> >>> >> >> >> >> * deploy and start slider application, it runs fine >>> >> >> >>> >> >> >> >> * restart rm1, rm2 is now active. >>> >> >> >>> >> >> >> >> * The slider-am now goes from running into >>> "ACCEPTED" >>> >> >> >>>mode. It >>> >> >> >>> >> >>stays >>> >> >> >>> >> >> >> >>there >>> >> >> >>> >> >> >> >> till rm1 is made active again. >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> In the slider-am log, it tries to connect to RM2 >>> and >>> >> >> >>> connection >>> >> >> >>> >> >>fails >>> >> >> >>> >> >> >> >>due >>> >> >> >>> >> >> >> >> to >>> org.apache.hadoop.security.AccessControlException: >>> >> >> >>>Client >>> >> >> >>> >> >>cannot >>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]. See detailed log below >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> It seems it has some token (delegation token?) for >>> >>RM1 >>> >> >>but >>> >> >> >>> >>tries >>> >> >> >>> >> >>to >>> >> >> >>> >> >> >>use >>> >> >> >>> >> >> >> >> same(?) for RM2 and fails. Am I missing some >>> >> >>configuration >>> >> >> >>>??? >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> Thanks, >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] >>> >>INFO >>> >> >> >>> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing >>> >> >>over to >>> >> >> >>> rm2 >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] >>> >>WARN >>> >> >> >>> >> >> >> >> security.UserGroupInformation - >>> >> >>PriviledgedActionException >>> >> >> >>> >> >> >>as:abc@XYZ >>> >> >> >>> >> >> >> >> (auth:KERBEROS) >>> >> >> >>> >> >> >>> >>cause:org.apache.hadoop.security.AccessControlException: >>> >> >> >>> >> >> >> >> Client cannot authenticate via:[TOKEN] >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] >>> >>WARN >>> >> >> >>> >> >>ipc.Client - >>> >> >> >>> >> >> >> >> Exception encountered while connecting to the >>> server >>> >>: >>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: >>> >> >>Client >>> >> >> >>> >>cannot >>> >> >> >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] >>> >>WARN >>> >> >> >>> >> >> >> >> security.UserGroupInformation - >>> >> >>PriviledgedActionException >>> >> >> >>> >> >> >>as:abc@XYZ >>> >> >> >>> >> >> >> >> (auth:KERBEROS) cause:java.io.IOException: >>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: >>> >> >>Client >>> >> >> >>> >>cannot >>> >> >> >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] >>> >>INFO >>> >> >> >>> >> >> >> >> retry.RetryInvocationHandler - Exception while >>> >>invoking >>> >> >> >>> >>allocate >>> >> >> >>> >> >>of >>> >> >> >>> >> >> >> >>class >>> >> >> >>> >> >> >> >> ApplicationMasterProtocolPBClientImpl over rm2 >>> after >>> >>287 >>> >> >> >>>fail >>> >> >> >>> >>over >>> >> >> >>> >> >> >> >> attempts. Trying to fail over immediately. >>> >> >> >>> >> >> >> >> java.io.IOException: Failed on local exception: >>> >> >> >>> >> >>java.io.IOException: >>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: >>> >> >>Client >>> >> >> >>> >>cannot >>> >> >> >>> >> >> >> >> authenticate via:[TOKEN]; Host Details : local host >>> >>is: >>> >> >> >>> >>"<SliderAM >>> >> >> >>> >> >> >> >> HOST>/<slider AM Host IP>"; destination host is: >>> >>"<RM2 >>> >> >> >>> >> >>HOST>":23130; >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >>> >> >> >>> >>>>>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1476) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1403) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(Proto >>> >>>>>>>>>>>>>>>bu >>> >> >>>>>>>>>>>>>fR >>> >> >> >>>>>>>>>>>pcE >>> >> >> >>> >>>>>>>>ng >>> >> >> >>> >> >>>>>>in >>> >> >> >>> >> >> >>>>e. >>> >> >> >>> >> >> >> >>java:230) >>> >> >> >>> >> >> >> >> at com.sun.proxy.$Proxy23.allocate(Unknown >>> >> >>Source) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterP >>> >>>>>>>>>>>>>>>ro >>> >> >>>>>>>>>>>>>to >>> >> >> >>>>>>>>>>>col >>> >> >> >>> >>>>>>>>PB >>> >> >> >>> >> >>>>>>Cl >>> >> >> >>> >> >> >>>>ie >>> >> >> >>> >> >> >> >>> >> >> >>>>>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >>> >> >> >>> >> >> >>Source) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> >>>>>>>>>>>>>>>th >>> >> >>>>>>>>>>>>>od >>> >> >> >>>>>>>>>>>Acc >>> >> >> >>> >>>>>>>>es >>> >> >> >>> >> >>>>>>so >>> >> >> >>> >> >> >>>>rI >>> >> >> >>> >> >> >> >>mpl.java:43) >>> >> >> >>> >> >> >> >> at >>> >> >>java.lang.reflect.Method.invoke(Method.java:497) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth >>> >>>>>>>>>>>>>>>od >>> >> >>>>>>>>>>>>>(R >>> >> >> >>>>>>>>>>>etr >>> >> >> >>> >>>>>>>>yI >>> >> >> >>> >> >>>>>>nv >>> >> >> >>> >> >> >>>>oc >>> >> >> >>> >> >> >> >>ationHandler.java:252) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret >>> >>>>>>>>>>>>>>>ry >>> >> >>>>>>>>>>>>>In >>> >> >> >>>>>>>>>>>voc >>> >> >> >>> >>>>>>>>at >>> >> >> >>> >> >>>>>>io >>> >> >> >>> >> >> >>>>nH >>> >> >> >>> >> >> >> >>andler.java:104) >>> >> >> >>> >> >> >> >> at com.sun.proxy.$Proxy24.allocate(Unknown >>> >> >>Source) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.alloca >>> >>>>>>>>>>>>>>>te >>> >> >>>>>>>>>>>>>(A >>> >> >> >>>>>>>>>>>MRM >>> >> >> >>> >>>>>>>>Cl >>> >> >> >>> >> >>>>>>ie >>> >> >> >>> >> >> >>>>nt >>> >> >> >>> >> >> >> >>Impl.java:278) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsync >>> >>>>>>>>>>>>>>>Im >>> >> >>>>>>>>>>>>>pl >>> >> >> >>>>>>>>>>>$He >>> >> >> >>> >>>>>>>>ar >>> >> >> >>> >> >>>>>>tb >>> >> >> >>> >> >> >>>>ea >>> >> >> >>> >> >> >> >>tThread.run(AMRMClientAsyncImpl.java:224) >>> >> >> >>> >> >> >> >> Caused by: java.io.IOException: >>> >> >> >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: >>> >> >>Client >>> >> >> >>> >>cannot >>> >> >> >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>java.security.AccessController.doPrivileged(Native >>> >> >> >>> >> >>Method) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>javax.security.auth.Subject.doAs(Subject.java:422) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >>> >>>>>>>>>>>>>>>up >>> >> >>>>>>>>>>>>>In >>> >> >> >>>>>>>>>>>for >>> >> >> >>> >>>>>>>>ma >>> >> >> >>> >> >>>>>>ti >>> >> >> >>> >> >> >>>>on >>> >> >> >>> >> >> >> >>.java:1671) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnection >>> >>>>>>>>>>>>>>>Fa >>> >> >>>>>>>>>>>>>il >>> >> >> >>>>>>>>>>>ure >>> >> >> >>> >>>>>>>>(C >>> >> >> >>> >> >>>>>>li >>> >> >> >>> >> >> >>>>en >>> >> >> >>> >> >> >> >>t.java:645) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client. >>> >>>>>>>>>>>>>ja >>> >> >>>>>>>>>>>va >>> >> >> >>>>>>>>>:73 >>> >> >> >>> >>>>>>3) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:37 >>> >>>>>>>>>0) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >>> >>>>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>org.apache.hadoop.ipc.Client.call(Client.java:1442) >>> >> >> >>> >> >> >> >> ... 12 more >>> >> >> >>> >> >> >> >> Caused by: >>> >> >> >>>org.apache.hadoop.security.AccessControlException: >>> >> >> >>> >> >>Client >>> >> >> >>> >> >> >> >> cannot authenticate via:[TOKEN] >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(Sa >>> >>>>>>>>>>>>>>>sl >>> >> >>>>>>>>>>>>>Rp >>> >> >> >>>>>>>>>>>cCl >>> >> >> >>> >>>>>>>>ie >>> >> >> >>> >> >>>>>>nt >>> >> >> >>> >> >> >>>>.j >>> >> >> >>> >> >> >> >>ava:172) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpc >>> >>>>>>>>>>>>>>>Cl >>> >> >>>>>>>>>>>>>ie >>> >> >> >>>>>>>>>>>nt. >>> >> >> >>> >>>>>>>>ja >>> >> >> >>> >> >>>>>>va >>> >> >> >>> >> >> >>>>:3 >>> >> >> >>> >> >> >> >>96) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection( >>> >>>>>>>>>>>>>>>Cl >>> >> >>>>>>>>>>>>>ie >>> >> >> >>>>>>>>>>>nt. >>> >> >> >>> >>>>>>>>ja >>> >> >> >>> >> >>>>>>va >>> >> >> >>> >> >> >>>>:5 >>> >> >> >>> >> >> >> >>55) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:37 >>> >>>>>>>>>0) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >>> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>java.security.AccessController.doPrivileged(Native >>> >> >> >>> >> >>Method) >>> >> >> >>> >> >> >> >> at >>> >> >> >>>javax.security.auth.Subject.doAs(Subject.java:422) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >>> >>>>>>>>>>>>>>>up >>> >> >>>>>>>>>>>>>In >>> >> >> >>>>>>>>>>>for >>> >> >> >>> >>>>>>>>ma >>> >> >> >>> >> >>>>>>ti >>> >> >> >>> >> >> >>>>on >>> >> >> >>> >> >> >> >>.java:1671) >>> >> >> >>> >> >> >> >> at >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client. >>> >>>>>>>>>>>>>ja >>> >> >>>>>>>>>>>va >>> >> >> >>>>>>>>>:72 >>> >> >> >>> >>>>>>0) >>> >> >> >>> >> >> >> >> ... 15 more >>> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] >>> >>INFO >>> >> >> >>> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing >>> >> >>over to >>> >> >> >>> rm1 >>> >> >> >>> >> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >>> >>> >> >