The node you are running slider from, is that a gateway node? Sorry for not being explicit. I meant copy everything under /etc/hadoop/conf from your cluster into some temp directory (say /tmp/hadoop_conf) in your gateway node or local or whichever node you are running slider from. Then set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out from slider-client.xml.
On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >Hi Gour, > >Thanks for your prompt reply. > >FYI, issue happens when I create slider app when rm1 is active and when >rm1 >fails over to rm2. As soon as rm2 becomes active; the slider AM goes from >RUNNING to ACCEPTED state with above error. > >For your suggestion, I did following > >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from >HADOOP_CONF_DIR >to slider conf directory. >2) Our slider-env.sh already had HADOOP_CONF_DIR set >3) I removed all properties from slider-client.xml EXCEPT following > > - HADOOP_CONF_DIR > - slider.yarn.queue > - slider.zookeeper.quorum > - hadoop.registry.zk.quorum > - hadoop.registry.zk.root > - hadoop.security.authorization > - hadoop.security.authentication > >Then I made rm1 active, installed and created slider app and restarted rm1 >(to make rm2) active. The slider-am again went from RUNNING to ACCEPTED >state. > >Let me know if you want me to try further changes. > >If I make the slider-client.xml completely empty per your suggestion, only >slider AM comes up but it >fails to start components. The AM log shows errors trying to connect to >zookeeper like below. >2016-07-25 23:07:41,532 >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, >closing socket connection and attempting reconnect >java.net.ConnectException: Connection refused > >Hence I kept minimal info in slider-client.xml > >FYI This is slider version 0.80 > >Thanks, > >Manoj > >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com> wrote: > >> If possible, can you copy the entire content of the directory >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it. >>Keep >> slider-client.xml empty. >> >> Now when you do the same rm1->rm2 and then the reverse failovers, do you >> see the same behaviors? >> >> -Gour >> >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >> >> >Another observation (whatever it is worth) >> > >> >If slider app is created and started when rm2 was active, then it >>seems to >> >survive switches between rm2 and rm1 (and back). I.e >> > >> >* rm2 is active >> >* create and start slider application >> >* fail over to rm1. Now the Slider AM keeps running >> >* fail over to rm2 again. Slider AM still keeps running >> > >> >So, it seems if it starts with rm1 active, then the AM goes to >>"ACCEPTED" >> >state when RM fails to rm2. If it starts with rm2 active, then it runs >> >fine >> >with any switches between rm1 and rm2. >> > >> >Any feedback ? >> > >> >Thanks, >> > >> >Manoj >> > >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel >><manojsamelt...@gmail.com> >> >wrote: >> > >> >> Setup >> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled >> >> - Slider 0.80 >> >> - In my slider-client.xml, I have added all RM HA properties, >>including >> >> the ones mentioned in http://markmail.org/message/wnhpp2zn6ixo65e3. >> >> >> >> Following is the issue >> >> >> >> * rm1 is active, rm2 is standby >> >> * deploy and start slider application, it runs fine >> >> * restart rm1, rm2 is now active. >> >> * The slider-am now goes from running into "ACCEPTED" mode. It stays >> >>there >> >> till rm1 is made active again. >> >> >> >> In the slider-am log, it tries to connect to RM2 and connection fails >> >>due >> >> to org.apache.hadoop.security.AccessControlException: Client cannot >> >> authenticate via:[TOKEN]. See detailed log below >> >> >> >> It seems it has some token (delegation token?) for RM1 but tries to >>use >> >> same(?) for RM2 and fails. Am I missing some configuration ??? >> >> >> >> Thanks, >> >> >> >> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >> >> security.UserGroupInformation - PriviledgedActionException >>as:abc@XYZ >> >> (auth:KERBEROS) >>cause:org.apache.hadoop.security.AccessControlException: >> >> Client cannot authenticate via:[TOKEN] >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN ipc.Client - >> >> Exception encountered while connecting to the server : >> >> org.apache.hadoop.security.AccessControlException: Client cannot >> >> authenticate via:[TOKEN] >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >> >> security.UserGroupInformation - PriviledgedActionException >>as:abc@XYZ >> >> (auth:KERBEROS) cause:java.io.IOException: >> >> org.apache.hadoop.security.AccessControlException: Client cannot >> >> authenticate via:[TOKEN] >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >> >> retry.RetryInvocationHandler - Exception while invoking allocate of >> >>class >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail over >> >> attempts. Trying to fail over immediately. >> >> java.io.IOException: Failed on local exception: java.io.IOException: >> >> org.apache.hadoop.security.AccessControlException: Client cannot >> >> authenticate via:[TOKEN]; Host Details : local host is: "<SliderAM >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2 HOST>":23130; >> >> at >> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) >> >> at >> >> >> >>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngin >>>>e. >> >>java:230) >> >> at com.sun.proxy.$Proxy23.allocate(Unknown Source) >> >> at >> >> >> >>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBCl >>>>ie >> >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) >> >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >>Source) >> >> at >> >> >> >>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >>>>rI >> >>mpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:497) >> >> at >> >> >> >>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv >>>>oc >> >>ationHandler.java:252) >> >> at >> >> >> >>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio >>>>nH >> >>andler.java:104) >> >> at com.sun.proxy.$Proxy24.allocate(Unknown Source) >> >> at >> >> >> >>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClie >>>>nt >> >>Impl.java:278) >> >> at >> >> >> >>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$Heartb >>>>ea >> >>tThread.run(AMRMClientAsyncImpl.java:224) >> >> Caused by: java.io.IOException: >> >> org.apache.hadoop.security.AccessControlException: Client cannot >> >> authenticate via:[TOKEN] >> >> at >> >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> at >> >> >> >>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati >>>>on >> >>.java:1671) >> >> at >> >> >> >>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Cli >>>>en >> >>t.java:645) >> >> at >> >> >>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733) >> >> at >> >> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >> >> at >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1442) >> >> ... 12 more >> >> Caused by: org.apache.hadoop.security.AccessControlException: Client >> >> cannot authenticate via:[TOKEN] >> >> at >> >> >> >>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient >>>>.j >> >>ava:172) >> >> at >> >> >> >>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java >>>>:3 >> >>96) >> >> at >> >> >> >>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java >>>>:5 >> >>55) >> >> at >> >> org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370) >> >> at >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) >> >> at >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> at >> >> >> >>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati >>>>on >> >>.java:1671) >> >> at >> >> >>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720) >> >> ... 15 more >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 >> >> >> >>