No need to copy any files. Pointing HADOOP_CONF_DIR to /etc/hadoop/conf is good.
-Gour On 7/28/16, 3:24 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >Follow up question regarding Gour's comment in earlier thread - > >Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf directory >(say /data/slider/conf) is different than HADOOP_CONF_DIR >(/etc/hadoop/conf). Is it required/recommended that files in >HADOOP_CONF_DIR be copied to SLIDER_HOME/conf and slider-env.sh script >sets >HADOOP_CONF_DIR to /data/slider/conf ? > >Or can the slider-env.sh set HADOOP_CONF_DIR to /etc/hadoop/conf , without >copying the files ? > >Using slider .80 for now, but would like to know recommendation for this >and future versions as well. > >Thanks in advance, > >Manoj > >On Tue, Jul 26, 2016 at 3:27 PM, Manoj Samel <manojsamelt...@gmail.com> >wrote: > >> Filed https://issues.apache.org/jira/browse/SLIDER-1158 with logs and my >> analysis of logs. >> >> On Tue, Jul 26, 2016 at 10:36 AM, Gour Saha <gs...@hortonworks.com> >>wrote: >> >>> Please file a JIRA and upload the logs to it. >>> >>> On 7/26/16, 10:21 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >>> >>> >Hi Gour, >>> > >>> >Can you please reach me using your own email-id? I will then send >>>logs to >>> >you, along with my analysis - I don't want to send logs on public list >>> > >>> >Thanks, >>> > >>> >On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha <gs...@hortonworks.com> >>> wrote: >>> > >>> >> Ok, so this node is not a gateway. It is part of the cluster, which >>> >>means >>> >> you don¹t need slider-client.xml at all. Just have HADOOP_CONF_DIR >>> >> pointing to /etc/hadoop/conf in slider-env.sh and that should be it. >>> >> >>> >> So the above simplifies your config setup. It will not solve either >>>of >>> >>the >>> >> 2 problems you are facing. >>> >> >>> >> Now coming to the 2 issues you are facing, you have to provide >>> >>additional >>> >> logs for us to understand better. Let¹s start with - >>> >> 1. RM logs (specifically between the time when rm1->rm2 failover is >>> >> simulated) >>> >> 2. Slider App logs >>> >> >>> >> -Gour >>> >> >>> >> On 7/25/16, 5:16 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >>> >> >>> >> > 1. Not clear about your question on "gateway" node. The node >>> running >>> >> > slider is part of the hadoop cluster and there are other >>>services >>> >>like >>> >> > Oozie that run on this node that utilizes hdfs and yarn. So if >>>your >>> >> > question is whether the node is otherwise working for HDFS and >>>Yarn >>> >> > configuration, it is working >>> >> > 2. I copied all files from HADOOP_CONF_DIR (say >>>/etc/hadoop/conf) >>> to >>> >> >the >>> >> > directory containing slider-client.xml (say /data/latest/conf) >>> >> > 3. In earlier email, I had done a mistake where slider-env.sh >>>file >>> >> >HADOOP_CONF_DIR >>> >> > was pointing to original directory /etc/hadoop/conf. I edited >>>it to >>> >> > point to same directory containing slider-client.xml & >>> slider-env.sh >>> >> >i.e. >>> >> > /data/latest/conf >>> >> > 4. I emptied slider-client.xml. It just had the >>> >> ><configuration></configuration>. >>> >> > The creation of spas worked but the Slider AM still shows the >>>same >>> >> >issue. >>> >> > i.e. when RM1 goes from active to standby, slider AM goes from >>> >>RUNNING >>> >> >to >>> >> > ACCPTED state with same error about TOKEN. Also NOTE that when >>> >> > slider-client.xml is empty, the "slider destroy xxx" command >>>still >>> >> >fails >>> >> > with Zookeeper connection errors. >>> >> > 5. I then added same parameters (as my last email - except >>> >> > HADOOP_CONF_DIR) to slider-client.xml and ran. This time >>> >>slider-env.sh >>> >> > has HADOOP_CONF_DIR pointing to /data/latest/conf and >>> >>slider-client.xml >>> >> > does not have HADOOP_CONF_DIR. The same issue exists (but >>>"slider >>> >> > destroy" does not fails) >>> >> > 6. Could you explain what do you expect to pick up from Hadoop >>> >> > configurations that will help you in RM Token ? If slider has >>>token >>> >> >from >>> >> > RM1, and it switches to RM2, not clear what slider does to get >>> >> >delegation >>> >> > token for RM2 communication ? >>> >> > 7. It is worth repeating again that issue happens only when RM1 >>>was >>> >> > active when slider app was created and then RM1 becomes >>>standby. If >>> >> >RM2 was >>> >> > active when slider app was created, then slider AM keeps running >>> for >>> >> >any >>> >> > number of switches between RM2 and RM1 back and forth ... >>> >> > >>> >> > >>> >> >On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha <gs...@hortonworks.com> >>> >>wrote: >>> >> > >>> >> >> The node you are running slider from, is that a gateway node? >>>Sorry >>> >>for >>> >> >> not being explicit. I meant copy everything under >>>/etc/hadoop/conf >>> >>from >>> >> >> your cluster into some temp directory (say /tmp/hadoop_conf) in >>>your >>> >> >> gateway node or local or whichever node you are running slider >>>from. >>> >> >>Then >>> >> >> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out >>> from >>> >> >> slider-client.xml. >>> >> >> >>> >> >> On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> >>> wrote: >>> >> >> >>> >> >> >Hi Gour, >>> >> >> > >>> >> >> >Thanks for your prompt reply. >>> >> >> > >>> >> >> >FYI, issue happens when I create slider app when rm1 is active >>>and >>> >>when >>> >> >> >rm1 >>> >> >> >fails over to rm2. As soon as rm2 becomes active; the slider AM >>> goes >>> >> >>from >>> >> >> >RUNNING to ACCEPTED state with above error. >>> >> >> > >>> >> >> >For your suggestion, I did following >>> >> >> > >>> >> >> >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from >>> >> >> >HADOOP_CONF_DIR >>> >> >> >to slider conf directory. >>> >> >> >2) Our slider-env.sh already had HADOOP_CONF_DIR set >>> >> >> >3) I removed all properties from slider-client.xml EXCEPT >>>following >>> >> >> > >>> >> >> > - HADOOP_CONF_DIR >>> >> >> > - slider.yarn.queue >>> >> >> > - slider.zookeeper.quorum >>> >> >> > - hadoop.registry.zk.quorum >>> >> >> > - hadoop.registry.zk.root >>> >> >> > - hadoop.security.authorization >>> >> >> > - hadoop.security.authentication >>> >> >> > >>> >> >> >Then I made rm1 active, installed and created slider app and >>> >>restarted >>> >> >>rm1 >>> >> >> >(to make rm2) active. The slider-am again went from RUNNING to >>> >>ACCEPTED >>> >> >> >state. >>> >> >> > >>> >> >> >Let me know if you want me to try further changes. >>> >> >> > >>> >> >> >If I make the slider-client.xml completely empty per your >>> >>suggestion, >>> >> >>only >>> >> >> >slider AM comes up but it >>> >> >> >fails to start components. The AM log shows errors trying to >>> >>connect to >>> >> >> >zookeeper like below. >>> >> >> >2016-07-25 23:07:41,532 >>> >> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN >>> >> >> >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected >>> >>error, >>> >> >> >closing socket connection and attempting reconnect >>> >> >> >java.net.ConnectException: Connection refused >>> >> >> > >>> >> >> >Hence I kept minimal info in slider-client.xml >>> >> >> > >>> >> >> >FYI This is slider version 0.80 >>> >> >> > >>> >> >> >Thanks, >>> >> >> > >>> >> >> >Manoj >>> >> >> > >>> >> >> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha >>><gs...@hortonworks.com> >>> >> >>wrote: >>> >> >> > >>> >> >> >> If possible, can you copy the entire content of the directory >>> >> >> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in >>>slider-env.sh to >>> >>it. >>> >> >> >>Keep >>> >> >> >> slider-client.xml empty. >>> >> >> >> >>> >> >> >> Now when you do the same rm1->rm2 and then the reverse >>>failovers, >>> >>do >>> >> >>you >>> >> >> >> see the same behaviors? >>> >> >> >> >>> >> >> >> -Gour >>> >> >> >> >>> >> >> >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> >>> >>wrote: >>> >> >> >> >>> >> >> >> >Another observation (whatever it is worth) >>> >> >> >> > >>> >> >> >> >If slider app is created and started when rm2 was active, >>>then >>> it >>> >> >> >>seems to >>> >> >> >> >survive switches between rm2 and rm1 (and back). I.e >>> >> >> >> > >>> >> >> >> >* rm2 is active >>> >> >> >> >* create and start slider application >>> >> >> >> >* fail over to rm1. Now the Slider AM keeps running >>> >> >> >> >* fail over to rm2 again. Slider AM still keeps running >>> >> >> >> > >>> >> >> >> >So, it seems if it starts with rm1 active, then the AM goes >>>to >>> >> >> >>"ACCEPTED" >>> >> >> >> >state when RM fails to rm2. If it starts with rm2 active, >>>then >>> it >>> >> >>runs >>> >> >> >> >fine >>> >> >> >> >with any switches between rm1 and rm2. >>> >> >> >> > >>> >> >> >> >Any feedback ? >>> >> >> >> > >>> >> >> >> >Thanks, >>> >> >> >> > >>> >> >> >> >Manoj >>> >> >> >> > >>> >> >> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel >>> >> >> >><manojsamelt...@gmail.com> >>> >> >> >> >wrote: >>> >> >> >> > >>> >> >> >> >> Setup >>> >> >> >> >> >>> >> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled >>> >> >> >> >> - Slider 0.80 >>> >> >> >> >> - In my slider-client.xml, I have added all RM HA >>>properties, >>> >> >> >>including >>> >> >> >> >> the ones mentioned in >>> >> >>http://markmail.org/message/wnhpp2zn6ixo65e3. >>> >> >> >> >> >>> >> >> >> >> Following is the issue >>> >> >> >> >> >>> >> >> >> >> * rm1 is active, rm2 is standby >>> >> >> >> >> * deploy and start slider application, it runs fine >>> >> >> >> >> * restart rm1, rm2 is now active. >>> >> >> >> >> * The slider-am now goes from running into "ACCEPTED" >>>mode. It >>> >> >>stays >>> >> >> >> >>there >>> >> >> >> >> till rm1 is made active again. >>> >> >> >> >> >>> >> >> >> >> In the slider-am log, it tries to connect to RM2 and >>> connection >>> >> >>fails >>> >> >> >> >>due >>> >> >> >> >> to org.apache.hadoop.security.AccessControlException: >>>Client >>> >> >>cannot >>> >> >> >> >> authenticate via:[TOKEN]. See detailed log below >>> >> >> >> >> >>> >> >> >> >> It seems it has some token (delegation token?) for RM1 but >>> >>tries >>> >> >>to >>> >> >> >>use >>> >> >> >> >> same(?) for RM2 and fails. Am I missing some configuration >>>??? >>> >> >> >> >> >>> >> >> >> >> Thanks, >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO >>> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to >>> rm2 >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >>> >> >> >> >> security.UserGroupInformation - PriviledgedActionException >>> >> >> >>as:abc@XYZ >>> >> >> >> >> (auth:KERBEROS) >>> >> >> >>cause:org.apache.hadoop.security.AccessControlException: >>> >> >> >> >> Client cannot authenticate via:[TOKEN] >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >>> >> >>ipc.Client - >>> >> >> >> >> Exception encountered while connecting to the server : >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >>> >>cannot >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN >>> >> >> >> >> security.UserGroupInformation - PriviledgedActionException >>> >> >> >>as:abc@XYZ >>> >> >> >> >> (auth:KERBEROS) cause:java.io.IOException: >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >>> >>cannot >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >>> >> >> >> >> retry.RetryInvocationHandler - Exception while invoking >>> >>allocate >>> >> >>of >>> >> >> >> >>class >>> >> >> >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 >>>fail >>> >>over >>> >> >> >> >> attempts. Trying to fail over immediately. >>> >> >> >> >> java.io.IOException: Failed on local exception: >>> >> >>java.io.IOException: >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >>> >>cannot >>> >> >> >> >> authenticate via:[TOKEN]; Host Details : local host is: >>> >>"<SliderAM >>> >> >> >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2 >>> >> >>HOST>":23130; >>> >> >> >> >> at >>> >> >> >> >>>>>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >>> >> >> >> >> at >>>org.apache.hadoop.ipc.Client.call(Client.java:1476) >>> >> >> >> >> at >>>org.apache.hadoop.ipc.Client.call(Client.java:1403) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufR >>>>>>>>>>>pcE >>> >>>>>>>>ng >>> >> >>>>>>in >>> >> >> >>>>e. >>> >> >> >> >>java:230) >>> >> >> >> >> at com.sun.proxy.$Proxy23.allocate(Unknown Source) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProto >>>>>>>>>>>col >>> >>>>>>>>PB >>> >> >>>>>>Cl >>> >> >> >>>>ie >>> >> >> >> >>>>>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) >>> >> >> >> >> at >>> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown >>> >> >> >>Source) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod >>>>>>>>>>>Acc >>> >>>>>>>>es >>> >> >>>>>>so >>> >> >> >>>>rI >>> >> >> >> >>mpl.java:43) >>> >> >> >> >> at java.lang.reflect.Method.invoke(Method.java:497) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(R >>>>>>>>>>>etr >>> >>>>>>>>yI >>> >> >>>>>>nv >>> >> >> >>>>oc >>> >> >> >> >>ationHandler.java:252) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryIn >>>>>>>>>>>voc >>> >>>>>>>>at >>> >> >>>>>>io >>> >> >> >>>>nH >>> >> >> >> >>andler.java:104) >>> >> >> >> >> at com.sun.proxy.$Proxy24.allocate(Unknown Source) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(A >>>>>>>>>>>MRM >>> >>>>>>>>Cl >>> >> >>>>>>ie >>> >> >> >>>>nt >>> >> >> >> >>Impl.java:278) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl >>>>>>>>>>>$He >>> >>>>>>>>ar >>> >> >>>>>>tb >>> >> >> >>>>ea >>> >> >> >> >>tThread.run(AMRMClientAsyncImpl.java:224) >>> >> >> >> >> Caused by: java.io.IOException: >>> >> >> >> >> org.apache.hadoop.security.AccessControlException: Client >>> >>cannot >>> >> >> >> >> authenticate via:[TOKEN] >>> >> >> >> >> at >>> >> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682) >>> >> >> >> >> at >>>java.security.AccessController.doPrivileged(Native >>> >> >>Method) >>> >> >> >> >> at >>>javax.security.auth.Subject.doAs(Subject.java:422) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn >>>>>>>>>>>for >>> >>>>>>>>ma >>> >> >>>>>>ti >>> >> >> >>>>on >>> >> >> >> >>.java:1671) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFail >>>>>>>>>>>ure >>> >>>>>>>>(C >>> >> >>>>>>li >>> >> >> >>>>en >>> >> >> >> >>t.java:645) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java >>>>>>>>>:73 >>> >>>>>>3) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >>>>>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >>> >> >> >> >> at >>> >> >> >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) >>> >> >> >> >> at >>>org.apache.hadoop.ipc.Client.call(Client.java:1442) >>> >> >> >> >> ... 12 more >>> >> >> >> >> Caused by: >>>org.apache.hadoop.security.AccessControlException: >>> >> >>Client >>> >> >> >> >> cannot authenticate via:[TOKEN] >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRp >>>>>>>>>>>cCl >>> >>>>>>>>ie >>> >> >>>>>>nt >>> >> >> >>>>.j >>> >> >> >> >>ava:172) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClie >>>>>>>>>>>nt. >>> >>>>>>>>ja >>> >> >>>>>>va >>> >> >> >>>>:3 >>> >> >> >> >>96) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Clie >>>>>>>>>>>nt. >>> >>>>>>>>ja >>> >> >>>>>>va >>> >> >> >>>>:5 >>> >> >> >> >>55) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >>>>>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370) >>> >> >> >> >> at >>> >> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) >>> >> >> >> >> at >>> >> >> >> >>>>>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721) >>> >> >> >> >> at >>>java.security.AccessController.doPrivileged(Native >>> >> >>Method) >>> >> >> >> >> at >>>javax.security.auth.Subject.doAs(Subject.java:422) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn >>>>>>>>>>>for >>> >>>>>>>>ma >>> >> >>>>>>ti >>> >> >> >>>>on >>> >> >> >> >>.java:1671) >>> >> >> >> >> at >>> >> >> >> >> >>> >> >> >>> >> >>> >>> >>>>>>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java >>>>>>>>>:72 >>> >>>>>>0) >>> >> >> >> >> ... 15 more >>> >> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO >>> >> >> >> >> client.ConfiguredRMFailoverProxyProvider - Failing over to >>> rm1 >>> >> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >>> >>> >>