Hi Gour, Thanks for your prompt reply.
FYI, issue happens when I create slider app when rm1 is active and when rm1 fails over to rm2. As soon as rm2 becomes active; the slider AM goes from RUNNING to ACCEPTED state with above error. For your suggestion, I did following 1) Copied core-site, hdfs-site, yarn-site, and mapred-site from HADOOP_CONF_DIR to slider conf directory. 2) Our slider-env.sh already had HADOOP_CONF_DIR set 3) I removed all properties from slider-client.xml EXCEPT following - HADOOP_CONF_DIR - slider.yarn.queue - slider.zookeeper.quorum - hadoop.registry.zk.quorum - hadoop.registry.zk.root - hadoop.security.authorization - hadoop.security.authentication Then I made rm1 active, installed and created slider app and restarted rm1 (to make rm2) active. The slider-am again went from RUNNING to ACCEPTED state. Let me know if you want me to try further changes. If I make the slider-client.xml completely empty per your suggestion, only slider AM comes up but it fails to start components. The AM log shows errors trying to connect to zookeeper like below. 2016-07-25 23:07:41,532 [AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused Hence I kept minimal info in slider-client.xml FYI This is slider version 0.80 Thanks, Manoj On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com> wrote: > If possible, can you copy the entire content of the directory > /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it. Keep > slider-client.xml empty. > > Now when you do the same rm1->rm2 and then the reverse failovers, do you > see the same behaviors? > > -Gour > > On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > > >Another observation (whatever it is worth) > > > >If slider app is created and started when rm2 was active, then it seems to > >survive switches between rm2 and rm1 (and back). I.e > > > >* rm2 is active > >* create and start slider application > >* fail over to rm1. Now the Slider AM keeps running > >* fail over to rm2 again. Slider AM still keeps running > > > >So, it seems if it starts with rm1 active, then the AM goes to "ACCEPTED" > >state when RM fails to rm2. If it starts with rm2 active, then it runs > >fine > >with any switches between rm1 and rm2. > > > >Any feedback ? > > > >Thanks, > > > >Manoj > > > >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel <manojsamelt...@gmail.com> > >wrote: > > > >> Setup > >> > >> - Hadoop 2.6 with RM HA, Kerberos enabled > >> - Slider 0.80 > >> - In my slider-client.xml, I have added all RM HA properties, including > >> the ones mentioned in http://markmail.org/message/wnhpp2zn6ixo65e3. > >> > >> Following is the issue > >> > >> * rm1 is active, rm2 is standby > >> * deploy and start slider application, it runs fine > >> * restart rm1, rm2 is now active. > >> * The slider-am now goes from running into "ACCEPTED" mode. It stays > >>there > >> till rm1 is made active again. > >> > >> In the slider-am log, it tries to connect to RM2 and connection fails > >>due > >> to org.apache.hadoop.security.AccessControlException: Client cannot > >> authenticate via:[TOKEN]. See detailed log below > >> > >> It seems it has some token (delegation token?) for RM1 but tries to use > >> same(?) for RM2 and fails. Am I missing some configuration ??? > >> > >> Thanks, > >> > >> > >> > >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO > >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 > >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN > >> security.UserGroupInformation - PriviledgedActionException as:abc@XYZ > >> (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: > >> Client cannot authenticate via:[TOKEN] > >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN ipc.Client - > >> Exception encountered while connecting to the server : > >> org.apache.hadoop.security.AccessControlException: Client cannot > >> authenticate via:[TOKEN] > >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN > >> security.UserGroupInformation - PriviledgedActionException as:abc@XYZ > >> (auth:KERBEROS) cause:java.io.IOException: > >> org.apache.hadoop.security.AccessControlException: Client cannot > >> authenticate via:[TOKEN] > >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO > >> retry.RetryInvocationHandler - Exception while invoking allocate of > >>class > >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail over > >> attempts. Trying to fail over immediately. > >> java.io.IOException: Failed on local exception: java.io.IOException: > >> org.apache.hadoop.security.AccessControlException: Client cannot > >> authenticate via:[TOKEN]; Host Details : local host is: "<SliderAM > >> HOST>/<slider AM Host IP>"; destination host is: "<RM2 HOST>":23130; > >> at > >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) > >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) > >> at > >> > >>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine. > >>java:230) > >> at com.sun.proxy.$Proxy23.allocate(Unknown Source) > >> at > >> > >>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClie > >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > >> at > >> > >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI > >>mpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:497) > >> at > >> > >>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoc > >>ationHandler.java:252) > >> at > >> > >>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationH > >>andler.java:104) > >> at com.sun.proxy.$Proxy24.allocate(Unknown Source) > >> at > >> > >>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClient > >>Impl.java:278) > >> at > >> > >>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$Heartbea > >>tThread.run(AMRMClientAsyncImpl.java:224) > >> Caused by: java.io.IOException: > >> org.apache.hadoop.security.AccessControlException: Client cannot > >> authenticate via:[TOKEN] > >> at > >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> at > >> > >>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation > >>.java:1671) > >> at > >> > >>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Clien > >>t.java:645) > >> at > >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733) > >> at > >> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) > >> at org.apache.hadoop.ipc.Client.call(Client.java:1442) > >> ... 12 more > >> Caused by: org.apache.hadoop.security.AccessControlException: Client > >> cannot authenticate via:[TOKEN] > >> at > >> > >>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.j > >>ava:172) > >> at > >> > >>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:3 > >>96) > >> at > >> > >>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:5 > >>55) > >> at > >> org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370) > >> at > >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) > >> at > >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> at > >> > >>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation > >>.java:1671) > >> at > >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720) > >> ... 15 more > >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO > >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 > >> > >