1. Not clear about your question on "gateway" node. The node running slider is part of the hadoop cluster and there are other services like Oozie that run on this node that utilizes hdfs and yarn. So if your question is whether the node is otherwise working for HDFS and Yarn configuration, it is working 2. I copied all files from HADOOP_CONF_DIR (say /etc/hadoop/conf) to the directory containing slider-client.xml (say /data/latest/conf) 3. In earlier email, I had done a mistake where slider-env.sh file HADOOP_CONF_DIR was pointing to original directory /etc/hadoop/conf. I edited it to point to same directory containing slider-client.xml & slider-env.sh i.e. /data/latest/conf 4. I emptied slider-client.xml. It just had the <configuration></configuration>. The creation of spas worked but the Slider AM still shows the same issue. i.e. when RM1 goes from active to standby, slider AM goes from RUNNING to ACCPTED state with same error about TOKEN. Also NOTE that when slider-client.xml is empty, the "slider destroy xxx" command still fails with Zookeeper connection errors. 5. I then added same parameters (as my last email - except HADOOP_CONF_DIR) to slider-client.xml and ran. This time slider-env.sh has HADOOP_CONF_DIR pointing to /data/latest/conf and slider-client.xml does not have HADOOP_CONF_DIR. The same issue exists (but "slider destroy" does not fails) 6. Could you explain what do you expect to pick up from Hadoop configurations that will help you in RM Token ? If slider has token from RM1, and it switches to RM2, not clear what slider does to get delegation token for RM2 communication ? 7. It is worth repeating again that issue happens only when RM1 was active when slider app was created and then RM1 becomes standby. If RM2 was active when slider app was created, then slider AM keeps running for any number of switches between RM2 and RM1 back and forth ...
On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha <gs...@hortonworks.com> wrote: > The node you are running slider from, is that a gateway node? Sorry for > not being explicit. I meant copy everything under /etc/hadoop/conf from > your cluster into some temp directory (say /tmp/hadoop_conf) in your > gateway node or local or whichever node you are running slider from. Then > set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out from > slider-client.xml. > > On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > > >Hi Gour, > > > >Thanks for your prompt reply. > > > >FYI, issue happens when I create slider app when rm1 is active and when > >rm1 > >fails over to rm2. As soon as rm2 becomes active; the slider AM goes from > >RUNNING to ACCEPTED state with above error. > > > >For your suggestion, I did following > > > >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from > >HADOOP_CONF_DIR > >to slider conf directory. > >2) Our slider-env.sh already had HADOOP_CONF_DIR set > >3) I removed all properties from slider-client.xml EXCEPT following > > > > - HADOOP_CONF_DIR > > - slider.yarn.queue > > - slider.zookeeper.quorum > > - hadoop.registry.zk.quorum > > - hadoop.registry.zk.root > > - hadoop.security.authorization > > - hadoop.security.authentication > > > >Then I made rm1 active, installed and created slider app and restarted rm1 > >(to make rm2) active. The slider-am again went from RUNNING to ACCEPTED > >state. > > > >Let me know if you want me to try further changes. > > > >If I make the slider-client.xml completely empty per your suggestion, only > >slider AM comes up but it > >fails to start components. The AM log shows errors trying to connect to > >zookeeper like below. > >2016-07-25 23:07:41,532 > >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN > >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, > >closing socket connection and attempting reconnect > >java.net.ConnectException: Connection refused > > > >Hence I kept minimal info in slider-client.xml > > > >FYI This is slider version 0.80 > > > >Thanks, > > > >Manoj > > > >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com> wrote: > > > >> If possible, can you copy the entire content of the directory > >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it. > >>Keep > >> slider-client.xml empty. > >> > >> Now when you do the same rm1->rm2 and then the reverse failovers, do you > >> see the same behaviors? > >> > >> -Gour > >> > >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: > >> > >> >Another observation (whatever it is worth) > >> > > >> >If slider app is created and started when rm2 was active, then it > >>seems to > >> >survive switches between rm2 and rm1 (and back). I.e > >> > > >> >* rm2 is active > >> >* create and start slider application > >> >* fail over to rm1. Now the Slider AM keeps running > >> >* fail over to rm2 again. Slider AM still keeps running > >> > > >> >So, it seems if it starts with rm1 active, then the AM goes to > >>"ACCEPTED" > >> >state when RM fails to rm2. If it starts with rm2 active, then it runs > >> >fine > >> >with any switches between rm1 and rm2. > >> > > >> >Any feedback ? > >> > > >> >Thanks, > >> > > >> >Manoj > >> > > >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel > >><manojsamelt...@gmail.com> > >> >wrote: > >> > > >> >> Setup > >> >> > >> >> - Hadoop 2.6 with RM HA, Kerberos enabled > >> >> - Slider 0.80 > >> >> - In my slider-client.xml, I have added all RM HA properties, > >>including > >> >> the ones mentioned in http://markmail.org/message/wnhpp2zn6ixo65e3. > >> >> > >> >> Following is the issue > >> >> > >> >> * rm1 is active, rm2 is standby > >> >> * deploy and start slider application, it runs fine > >> >> * restart rm1, rm2 is now active. > >> >> * The slider-am now goes from running into "ACCEPTED" mode. It stays > >> >>there > >> >> till rm1 is made active again. > >> >> > >> >> In the slider-am log, it tries to connect to RM2 and connection fails > >> >>due > >> >> to org.apache.hadoop.security.AccessControlException: Client cannot > >> >> authenticate via:[TOKEN]. See detailed log below > >> >> > >> >> It seems it has some token (delegation token?) for RM1 but tries to > >>use > >> >> same(?) for RM2 and fails. Am I missing some configuration ??? > >> >> > >> >> Thanks, > >> >> > >> >> > >> >> > >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO > >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 > >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN > >> >> security.UserGroupInformation - PriviledgedActionException > >>as:abc@XYZ > >> >> (auth:KERBEROS) > >>cause:org.apache.hadoop.security.AccessControlException: > >> >> Client cannot authenticate via:[TOKEN] > >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN ipc.Client - > >> >> Exception encountered while connecting to the server : > >> >> org.apache.hadoop.security.AccessControlException: Client cannot > >> >> authenticate via:[TOKEN] > >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN > >> >> security.UserGroupInformation - PriviledgedActionException > >>as:abc@XYZ > >> >> (auth:KERBEROS) cause:java.io.IOException: > >> >> org.apache.hadoop.security.AccessControlException: Client cannot > >> >> authenticate via:[TOKEN] > >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO > >> >> retry.RetryInvocationHandler - Exception while invoking allocate of > >> >>class > >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail over > >> >> attempts. Trying to fail over immediately. > >> >> java.io.IOException: Failed on local exception: java.io.IOException: > >> >> org.apache.hadoop.security.AccessControlException: Client cannot > >> >> authenticate via:[TOKEN]; Host Details : local host is: "<SliderAM > >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2 HOST>":23130; > >> >> at > >> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngin > >>>>e. > >> >>java:230) > >> >> at com.sun.proxy.$Proxy23.allocate(Unknown Source) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBCl > >>>>ie > >> >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > >> >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown > >>Source) > >> >> at > >> >> > >> > >>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > >>>>rI > >> >>mpl.java:43) > >> >> at java.lang.reflect.Method.invoke(Method.java:497) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv > >>>>oc > >> >>ationHandler.java:252) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio > >>>>nH > >> >>andler.java:104) > >> >> at com.sun.proxy.$Proxy24.allocate(Unknown Source) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClie > >>>>nt > >> >>Impl.java:278) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$Heartb > >>>>ea > >> >>tThread.run(AMRMClientAsyncImpl.java:224) > >> >> Caused by: java.io.IOException: > >> >> org.apache.hadoop.security.AccessControlException: Client cannot > >> >> authenticate via:[TOKEN] > >> >> at > >> >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) > >> >> at java.security.AccessController.doPrivileged(Native Method) > >> >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati > >>>>on > >> >>.java:1671) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Cli > >>>>en > >> >>t.java:645) > >> >> at > >> >> > >>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733) > >> >> at > >> >> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > >> >> at > >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) > >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1442) > >> >> ... 12 more > >> >> Caused by: org.apache.hadoop.security.AccessControlException: Client > >> >> cannot authenticate via:[TOKEN] > >> >> at > >> >> > >> > >>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient > >>>>.j > >> >>ava:172) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java > >>>>:3 > >> >>96) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java > >>>>:5 > >> >>55) > >> >> at > >> >> org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370) > >> >> at > >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) > >> >> at > >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) > >> >> at java.security.AccessController.doPrivileged(Native Method) > >> >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> >> at > >> >> > >> > >>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati > >>>>on > >> >>.java:1671) > >> >> at > >> >> > >>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720) > >> >> ... 15 more > >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO > >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 > >> >> > >> > >> > >