Follow up question regarding Gour's comment in earlier thread - Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf directory (say /data/slider/conf) is different than HADOOP_CONF_DIR (/etc/hadoop/conf). Is it required/recommended that files in HADOOP_CONF_DIR be copied to SLIDER_HOME/conf and slider-env.sh script sets HADOOP_CONF_DIR to /data/slider/conf ?
Or can the slider-env.sh set HADOOP_CONF_DIR to /etc/hadoop/conf , without copying the files ? Using slider .80 for now, but would like to know recommendation for this and future versions as well. Thanks in advance, Manoj On Tue, Jul 26, 2016 at 3:27 PM, Manoj Samel <manojsamelt...@gmail.com> wrote: > Filed https://issues.apache.org/jira/browse/SLIDER-1158 with logs and my > analysis of logs. > > On Tue, Jul 26, 2016 at 10:36 AM, Gour Saha <gs...@hortonworks.com> wrote: > >> Please file a JIRA and upload the logs to it. >> >> On 7/26/16, 10:21 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >> >> >Hi Gour, >> > >> >Can you please reach me using your own email-id? I will then send logs to >> >you, along with my analysis - I don't want to send logs on public list >> > >> >Thanks, >> > >> >On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha <gs...@hortonworks.com> >> wrote: >> > >> >> Ok, so this node is not a gateway. It is part of the cluster, which >> >>means >> >> you don¹t need slider-client.xml at all. Just have HADOOP_CONF_DIR >> >> pointing to /etc/hadoop/conf in slider-env.sh and that should be it. >> >> >> >> So the above simplifies your config setup. It will not solve either of >> >>the >> >> 2 problems you are facing. >> >> >> >> Now coming to the 2 issues you are facing, you have to provide >> >>additional >> >> logs for us to understand better. Let¹s start with - >> >> 1. RM logs (specifically between the time when rm1->rm2 failover is >> >> simulated) >> >> 2. Slider App logs >> >> >> >> -Gour >> >> >> >> On 7/25/16, 5:16 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >> >> >> >> > 1. Not clear about your question on "gateway" node. The node >> running >> >> > slider is part of the hadoop cluster and there are other services >> >>like >> >> > Oozie that run on this node that utilizes hdfs and yarn. So if your >> >> > question is whether the node is otherwise working for HDFS and Yarn >> >> > configuration, it is working >> >> > 2. I copied all files from HADOOP_CONF_DIR (say /etc/hadoop/conf) >> to >> >> >the >> >> > directory containing slider-client.xml (say /data/latest/conf) >> >> > 3. In earlier email, I had done a mistake where slider-env.sh file >> >> >HADOOP_CONF_DIR >> >> > was pointing to original directory /etc/hadoop/conf. I edited it to >> >> > point to same directory containing slider-client.xml & >> slider-env.sh >> >> >i.e. >> >> > /data/latest/conf >> >> > 4. I emptied slider-client.xml. It just had the >> >> ><configuration></configuration>. >> >> > The creation of spas worked but the Slider AM still shows the same >> >> >issue. >> >> > i.e. when RM1 goes from active to standby, slider AM goes from >> >>RUNNING >> >> >to >> >> > ACCPTED state with same error about TOKEN. Also NOTE that when >> >> > slider-client.xml is empty, the "slider destroy xxx" command still >> >> >fails >> >> > with Zookeeper connection errors. >> >> > 5. I then added same parameters (as my last email - except >> >> > HADOOP_CONF_DIR) to slider-client.xml and ran. This time >> >>slider-env.sh >> >> > has HADOOP_CONF_DIR pointing to /data/latest/conf and >> >>slider-client.xml >> >> > does not have HADOOP_CONF_DIR. The same issue exists (but "slider >> >> > destroy" does not fails) >> >> > 6. Could you explain what do you expect to pick up from Hadoop >> >> > configurations that will help you in RM Token ? If slider has token >> >> >from >> >> > RM1, and it switches to RM2, not clear what slider does to get >> >> >delegation >> >> > token for RM2 communication ? >> >> > 7. It is worth repeating again that issue happens only when RM1 was >> >> > active when slider app was created and then RM1 becomes standby. If >> >> >RM2 was >> >> > active when slider app was created, then slider AM keeps running >> for >> >> >any >> >> > number of switches between RM2 and RM1 back and forth ... >> >> > >> >> > >> >> >On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha <gs...@hortonworks.com> >> >>wrote: >> >> > >> >> >> The node you are running slider from, is that a gateway node? Sorry >> >>for >> >> >> not being explicit. I meant copy everything under /etc/hadoop/conf >> >>from >> >> >> your cluster into some temp directory (say /tmp/hadoop_conf) in your >> >> >> gateway node or local or whichever node you are running slider from. >> >> >>Then >> >> >> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out >> from >> >> >> slider-client.xml. >> >> >> >> >> >> On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> >> wrote: >> >> >> >> >> >> >Hi Gour, >> >> >> > >> >> >> >Thanks for your prompt reply. >> >> >> > >> >> >> >FYI, issue happens when I create slider app when rm1 is active and >> >>when >> >> >> >rm1 >> >> >> >fails over to rm2. As soon as rm2 becomes active; the slider AM >> goes >> >> >>from >> >> >> >RUNNING to ACCEPTED state with above error. >> >> >> > >> >> >> >For your suggestion, I did following >> >> >> > >> >> >> >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from >> >> >> >HADOOP_CONF_DIR >> >> >> >to slider conf directory. >> >> >> >2) Our slider-env.sh already had HADOOP_CONF_DIR set >> >> >> >3) I removed all properties from slider-client.xml EXCEPT following >> >> >> > >> >> >> > - HADOOP_CONF_DIR >> >> >> > - slider.yarn.queue >> >> >> > - slider.zookeeper.quorum >> >> >> > - hadoop.registry.zk.quorum >> >> >> > - hadoop.registry.zk.root >> >> >> > - hadoop.security.authorization >> >> >> > - hadoop.security.authentication >> >> >> > >> >> >> >Then I made rm1 active, installed and created slider app and >> >>restarted >> >> >>rm1 >> >> >> >(to make rm2) active. The slider-am again went from RUNNING to >> >>ACCEPTED >> >> >> >state. >> >> >> > >> >> >> >Let me know if you want me to try further changes. >> >> >> > >> >> >> >If I make the slider-client.xml completely empty per your >> >>suggestion, >> >> >>only >> >> >> >slider AM comes up but it >> >> >> >fails to start components. The AM log shows errors trying to >> >>connect to >> >> >> >zookeeper like below. >> >> >> >2016-07-25 23:07:41,532 >> >> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN >> >> >> >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected >> >>error, >> >> >> >closing socket connection and attempting reconnect >> >> >> >java.net.ConnectException: Connection refused >> >> >> > >> >> >> >Hence I kept minimal info in slider-client.xml >> >> >> > >> >> >> >FYI This is slider version 0.80 >> >> >> > >> >> >> >Thanks, >> >> >> > >> >> >> >Manoj >> >> >> > >> >> >> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com> >> >> >>wrote: >> >> >> > >> >> >> >> If possible, can you copy the entire content of the directory >> >> >> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to >> >>it. >> >> >> >>Keep >> >> >> >> slider-client.xml empty. >> >> >> >> >> >> >> >> Now when you do the same rm1->rm2 and then the reverse failovers, >> >>do >> >> >>you >> >> >> >> see the same behaviors? >> >> >> >> >> >> >> >> -Gour >> >> >> >> >> >> >> >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> >> >>wrote: >> >> >> >> >> >> >> >> >Another observation (whatever it is worth) >> >> >> >> > >> >> >> >> >If slider app is created and started when rm2 was active, then >> it >> >> >> >>seems to >> >> >> >> >survive switches between rm2 and rm1 (and back). I.e >> >> >> >> > >> >> >> >> >* rm2 is active >> >> >> >> >* create and start slider application >> >> >> >> >* fail over to rm1. Now the Slider AM keeps running >> >> >> >> >* fail over to rm2 again. Slider AM still keeps running >> >> >> >> > >> >> >> >> >So, it seems if it starts with rm1 active, then the AM goes to >> >> >> >>"ACCEPTED" >> >> >> >> >state when RM fails to rm2. If it starts with rm2 active, then >> it >> >> >>runs >> >> >> >> >fine >> >> >> >> >with any switches between rm1 and rm2. >> >> >> >> > >> >> >> >> >Any feedback ? >> >> >> >> > >> >> >> >> >Thanks, >> >> >> >> > >> >> >> >> >Manoj >> >> >> >> > >> >> >> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel >> >> >> >><manojsamelt...@gmail.com> >> >> >> >> >wrote: >> >> >> >> > >> >> >> >> >> Setup >> >> >> >> >> >> >> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled >> >> >> >> >> - Slider 0.80 >> >> >> >> >> - In my slider-client.xml, I have added all RM HA properties, >> >> >> >>including >> >> >> >> >> the ones mentioned in >> >> >>http://markmail.org/message/wnhpp2zn6ixo65e3. >> >> >> >> >> >> >> >> >> >> Following is the issue >> >> >> >> >> >> >> >> >> >> * rm1 is active, rm2 is standby >> >> >> >> >> * deploy and start slider application, it runs fine >> >> >> >> >> * restart rm1, rm2 is now active. >> >> >> >> >> * The slider-am now goes from running into "ACCEPTED" mode. It >> >> >>stays >> >> >> >> >>there >> >> >> >> >> till rm1 is made active again. >> >> >> >> >> >> >> >> >> >> In the slider-am log, it tries to connect to RM2 and >> connection >> >> >>fails >> >> >> >> >>due >> >> >> >> >> to org.apache.hadoop.security.AccessControlException: Client >> >> >>cannot >> >> >> >> >> authenticate via:[TOKEN]. See detailed log below >> >> >> >> >> >> >> >> >> >> It seems it has some token (delegation token?) for RM1 but >> >>tries >> >> >>to >> >> >> >>use >> >> >> >> >> same(?) for RM2 and fails. Am I missing some configuration ??? >> >> >> >> >> >> >> >> >> >> Thanks, >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO >> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to >> rm2 >> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >> >> >> >> >> security.UserGroupInformation - PriviledgedActionException >> >> >> >>as:abc@XYZ >> >> >> >> >> (auth:KERBEROS) >> >> >> >>cause:org.apache.hadoop.security.AccessControlException: >> >> >> >> >> Client cannot authenticate via:[TOKEN] >> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >> >> >>ipc.Client - >> >> >> >> >> Exception encountered while connecting to the server : >> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >> >>cannot >> >> >> >> >> authenticate via:[TOKEN] >> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >> >> >> >> >> security.UserGroupInformation - PriviledgedActionException >> >> >> >>as:abc@XYZ >> >> >> >> >> (auth:KERBEROS) cause:java.io.IOException: >> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >> >>cannot >> >> >> >> >> authenticate via:[TOKEN] >> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >> >> >> >> >> retry.RetryInvocationHandler - Exception while invoking >> >>allocate >> >> >>of >> >> >> >> >>class >> >> >> >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail >> >>over >> >> >> >> >> attempts. Trying to fail over immediately. >> >> >> >> >> java.io.IOException: Failed on local exception: >> >> >>java.io.IOException: >> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >> >>cannot >> >> >> >> >> authenticate via:[TOKEN]; Host Details : local host is: >> >>"<SliderAM >> >> >> >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2 >> >> >>HOST>":23130; >> >> >> >> >> at >> >> >> >> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >> >> >> >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1476) >> >> >> >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcE >> >>>>>>>>ng >> >> >>>>>>in >> >> >> >>>>e. >> >> >> >> >>java:230) >> >> >> >> >> at com.sun.proxy.$Proxy23.allocate(Unknown Source) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocol >> >>>>>>>>PB >> >> >>>>>>Cl >> >> >> >>>>ie >> >> >> >> >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) >> >> >> >> >> at >> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >> >> >> >>Source) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc >> >>>>>>>>es >> >> >>>>>>so >> >> >> >>>>rI >> >> >> >> >>mpl.java:43) >> >> >> >> >> at java.lang.reflect.Method.invoke(Method.java:497) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Retr >> >>>>>>>>yI >> >> >>>>>>nv >> >> >> >>>>oc >> >> >> >> >>ationHandler.java:252) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvoc >> >>>>>>>>at >> >> >>>>>>io >> >> >> >>>>nH >> >> >> >> >>andler.java:104) >> >> >> >> >> at com.sun.proxy.$Proxy24.allocate(Unknown Source) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRM >> >>>>>>>>Cl >> >> >>>>>>ie >> >> >> >>>>nt >> >> >> >> >>Impl.java:278) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$He >> >>>>>>>>ar >> >> >>>>>>tb >> >> >> >>>>ea >> >> >> >> >>tThread.run(AMRMClientAsyncImpl.java:224) >> >> >> >> >> Caused by: java.io.IOException: >> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >> >>cannot >> >> >> >> >> authenticate via:[TOKEN] >> >> >> >> >> at >> >> >> >> >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) >> >> >> >> >> at java.security.AccessController.doPrivileged(Native >> >> >>Method) >> >> >> >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor >> >>>>>>>>ma >> >> >>>>>>ti >> >> >> >>>>on >> >> >> >> >>.java:1671) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure >> >>>>>>>>(C >> >> >>>>>>li >> >> >> >>>>en >> >> >> >> >>t.java:645) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:73 >> >>>>>>3) >> >> >> >> >> at >> >> >> >> >> >> >> >>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >> >> >> >> >> at >> >> >> >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) >> >> >> >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1442) >> >> >> >> >> ... 12 more >> >> >> >> >> Caused by: org.apache.hadoop.security.AccessControlException: >> >> >>Client >> >> >> >> >> cannot authenticate via:[TOKEN] >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcCl >> >>>>>>>>ie >> >> >>>>>>nt >> >> >> >>>>.j >> >> >> >> >>ava:172) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient. >> >>>>>>>>ja >> >> >>>>>>va >> >> >> >>>>:3 >> >> >> >> >>96) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client. >> >>>>>>>>ja >> >> >>>>>>va >> >> >> >>>>:5 >> >> >> >> >>55) >> >> >> >> >> at >> >> >> >> >> >> >> >>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370) >> >> >> >> >> at >> >> >> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) >> >> >> >> >> at >> >> >> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) >> >> >> >> >> at java.security.AccessController.doPrivileged(Native >> >> >>Method) >> >> >> >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor >> >>>>>>>>ma >> >> >>>>>>ti >> >> >> >>>>on >> >> >> >> >>.java:1671) >> >> >> >> >> at >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:72 >> >>>>>>0) >> >> >> >> >> ... 15 more >> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to >> rm1 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >