Hi Gour,

Can you please reach me using your own email-id? I will then send logs to
you, along with my analysis - I don't want to send logs on public list

Thanks,

On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Ok, so this node is not a gateway. It is part of the cluster, which means
> you don¹t need slider-client.xml at all. Just have HADOOP_CONF_DIR
> pointing to /etc/hadoop/conf in slider-env.sh and that should be it.
>
> So the above simplifies your config setup. It will not solve either of the
> 2 problems you are facing.
>
> Now coming to the 2 issues you are facing, you have to provide additional
> logs for us to understand better. Let¹s start with  -
> 1. RM logs (specifically between the time when rm1->rm2 failover is
> simulated)
> 2. Slider App logs
>
> -Gour
>
> On 7/25/16, 5:16 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>
> >   1. Not clear about your question on "gateway" node. The node running
> >   slider is part of the hadoop cluster and there are other services like
> >   Oozie that run on this node that utilizes hdfs and yarn. So if your
> >   question is whether the node is otherwise working for HDFS and Yarn
> >   configuration, it is working
> >   2. I copied all files from HADOOP_CONF_DIR (say /etc/hadoop/conf) to
> >the
> >   directory containing slider-client.xml (say /data/latest/conf)
> >   3. In earlier email, I had done a mistake where slider-env.sh file
> >HADOOP_CONF_DIR
> >   was pointing to original directory /etc/hadoop/conf. I edited it to
> >   point to same directory containing slider-client.xml & slider-env.sh
> >i.e.
> >   /data/latest/conf
> >   4. I emptied slider-client.xml. It just had the
> ><configuration></configuration>.
> >   The creation of spas worked but the Slider AM still shows the same
> >issue.
> >   i.e. when RM1 goes from active to standby, slider AM goes from RUNNING
> >to
> >   ACCPTED state with same error about TOKEN. Also NOTE that when
> >   slider-client.xml is empty, the "slider destroy xxx" command still
> >fails
> >   with Zookeeper connection errors.
> >   5. I then added same parameters (as my last email - except
> >   HADOOP_CONF_DIR) to slider-client.xml and ran. This time slider-env.sh
> >   has HADOOP_CONF_DIR pointing to /data/latest/conf and slider-client.xml
> >   does not have HADOOP_CONF_DIR. The same issue exists (but "slider
> >   destroy" does not fails)
> >   6. Could you explain what do you expect to pick up from Hadoop
> >   configurations that will help you in RM Token ? If slider has token
> >from
> >   RM1, and it switches to RM2, not clear what slider does to get
> >delegation
> >   token for RM2 communication ?
> >   7. It is worth repeating again that issue happens only when RM1 was
> >   active when slider app was created and then RM1 becomes standby. If
> >RM2 was
> >   active when slider app was created, then slider AM keeps running for
> >any
> >   number of switches between RM2 and RM1 back and forth ...
> >
> >
> >On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha <gs...@hortonworks.com> wrote:
> >
> >> The node you are running slider from, is that a gateway node? Sorry for
> >> not being explicit. I meant copy everything under /etc/hadoop/conf from
> >> your cluster into some temp directory (say /tmp/hadoop_conf) in your
> >> gateway node or local or whichever node you are running slider from.
> >>Then
> >> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out from
> >> slider-client.xml.
> >>
> >> On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
> >>
> >> >Hi Gour,
> >> >
> >> >Thanks for your prompt reply.
> >> >
> >> >FYI, issue happens when I create slider app when rm1 is active and when
> >> >rm1
> >> >fails over to rm2. As soon as rm2 becomes active; the slider AM goes
> >>from
> >> >RUNNING to ACCEPTED state with above error.
> >> >
> >> >For your suggestion, I did following
> >> >
> >> >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from
> >> >HADOOP_CONF_DIR
> >> >to slider conf directory.
> >> >2) Our slider-env.sh already had HADOOP_CONF_DIR set
> >> >3) I removed all properties from slider-client.xml EXCEPT following
> >> >
> >> >   - HADOOP_CONF_DIR
> >> >   - slider.yarn.queue
> >> >   - slider.zookeeper.quorum
> >> >   - hadoop.registry.zk.quorum
> >> >   - hadoop.registry.zk.root
> >> >   - hadoop.security.authorization
> >> >   - hadoop.security.authentication
> >> >
> >> >Then I made rm1 active, installed and created slider app and restarted
> >>rm1
> >> >(to make rm2) active. The slider-am again went from RUNNING to ACCEPTED
> >> >state.
> >> >
> >> >Let me know if you want me to try further changes.
> >> >
> >> >If I make the slider-client.xml completely empty per your suggestion,
> >>only
> >> >slider AM comes up but it
> >> >fails to start components. The AM log shows errors trying to connect to
> >> >zookeeper like below.
> >> >2016-07-25 23:07:41,532
> >> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN
> >> >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error,
> >> >closing socket connection and attempting reconnect
> >> >java.net.ConnectException: Connection refused
> >> >
> >> >Hence I kept minimal info in slider-client.xml
> >> >
> >> >FYI This is slider version 0.80
> >> >
> >> >Thanks,
> >> >
> >> >Manoj
> >> >
> >> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com>
> >>wrote:
> >> >
> >> >> If possible, can you copy the entire content of the directory
> >> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it.
> >> >>Keep
> >> >> slider-client.xml empty.
> >> >>
> >> >> Now when you do the same rm1->rm2 and then the reverse failovers, do
> >>you
> >> >> see the same behaviors?
> >> >>
> >> >> -Gour
> >> >>
> >> >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
> >> >>
> >> >> >Another observation (whatever it is worth)
> >> >> >
> >> >> >If slider app is created and started when rm2 was active, then it
> >> >>seems to
> >> >> >survive switches between rm2 and rm1 (and back). I.e
> >> >> >
> >> >> >* rm2 is active
> >> >> >* create and start slider application
> >> >> >* fail over to rm1. Now the Slider AM keeps running
> >> >> >* fail over to rm2 again. Slider AM still keeps running
> >> >> >
> >> >> >So, it seems if it starts with rm1 active, then the AM goes to
> >> >>"ACCEPTED"
> >> >> >state when RM fails to rm2. If it starts with rm2 active, then it
> >>runs
> >> >> >fine
> >> >> >with any switches between rm1 and rm2.
> >> >> >
> >> >> >Any feedback ?
> >> >> >
> >> >> >Thanks,
> >> >> >
> >> >> >Manoj
> >> >> >
> >> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel
> >> >><manojsamelt...@gmail.com>
> >> >> >wrote:
> >> >> >
> >> >> >> Setup
> >> >> >>
> >> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled
> >> >> >> - Slider 0.80
> >> >> >> - In my slider-client.xml, I have added all RM HA properties,
> >> >>including
> >> >> >> the ones mentioned in
> >>http://markmail.org/message/wnhpp2zn6ixo65e3.
> >> >> >>
> >> >> >> Following is the issue
> >> >> >>
> >> >> >> * rm1 is active, rm2 is standby
> >> >> >> * deploy and start slider application, it runs fine
> >> >> >> * restart rm1, rm2 is now active.
> >> >> >> * The slider-am now goes from running into "ACCEPTED" mode. It
> >>stays
> >> >> >>there
> >> >> >> till rm1 is made active again.
> >> >> >>
> >> >> >> In the slider-am log, it tries to connect to RM2 and connection
> >>fails
> >> >> >>due
> >> >> >> to org.apache.hadoop.security.AccessControlException: Client
> >>cannot
> >> >> >> authenticate via:[TOKEN]. See detailed log below
> >> >> >>
> >> >> >>  It seems it has some token (delegation token?) for RM1 but tries
> >>to
> >> >>use
> >> >> >> same(?) for RM2 and fails. Am I missing some configuration ???
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO
> >> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
> >> >> >>  security.UserGroupInformation - PriviledgedActionException
> >> >>as:abc@XYZ
> >> >> >> (auth:KERBEROS)
> >> >>cause:org.apache.hadoop.security.AccessControlException:
> >> >> >> Client cannot authenticate via:[TOKEN]
> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
> >>ipc.Client -
> >> >> >> Exception encountered while connecting to the server :
> >> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
> >> >> >> authenticate via:[TOKEN]
> >> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
> >> >> >>  security.UserGroupInformation - PriviledgedActionException
> >> >>as:abc@XYZ
> >> >> >> (auth:KERBEROS) cause:java.io.IOException:
> >> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
> >> >> >> authenticate via:[TOKEN]
> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO
> >> >> >>  retry.RetryInvocationHandler - Exception while invoking allocate
> >>of
> >> >> >>class
> >> >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail over
> >> >> >> attempts. Trying to fail over immediately.
> >> >> >> java.io.IOException: Failed on local exception:
> >>java.io.IOException:
> >> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
> >> >> >> authenticate via:[TOKEN]; Host Details : local host is: "<SliderAM
> >> >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2
> >>HOST>":23130;
> >> >> >>         at
> >> >> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> >> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> >> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEng
> >>>>>>in
> >> >>>>e.
> >> >> >>java:230)
> >> >> >>         at com.sun.proxy.$Proxy23.allocate(Unknown Source)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPB
> >>>>>>Cl
> >> >>>>ie
> >> >> >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> >> >> >>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
> >> >>Source)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> >>>>>>so
> >> >>>>rI
> >> >> >>mpl.java:43)
> >> >> >>         at java.lang.reflect.Method.invoke(Method.java:497)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
> >>>>>>nv
> >> >>>>oc
> >> >> >>ationHandler.java:252)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
> >>>>>>io
> >> >>>>nH
> >> >> >>andler.java:104)
> >> >> >>         at com.sun.proxy.$Proxy24.allocate(Unknown Source)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMCl
> >>>>>>ie
> >> >>>>nt
> >> >> >>Impl.java:278)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$Hear
> >>>>>>tb
> >> >>>>ea
> >> >> >>tThread.run(AMRMClientAsyncImpl.java:224)
> >> >> >> Caused by: java.io.IOException:
> >> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
> >> >> >> authenticate via:[TOKEN]
> >> >> >>         at
> >> >> >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
> >> >> >>         at java.security.AccessController.doPrivileged(Native
> >>Method)
> >> >> >>         at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
> >>>>>>ti
> >> >>>>on
> >> >> >>.java:1671)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(C
> >>>>>>li
> >> >>>>en
> >> >> >>t.java:645)
> >> >> >>         at
> >> >> >>
> >>
> >>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> >> >> >>         at
> >> >> >>
> >>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
> >> >> >>         at
> >> >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
> >> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1442)
> >> >> >>         ... 12 more
> >> >> >> Caused by: org.apache.hadoop.security.AccessControlException:
> >>Client
> >> >> >> cannot authenticate via:[TOKEN]
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClie
> >>>>>>nt
> >> >>>>.j
> >> >> >>ava:172)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.ja
> >>>>>>va
> >> >>>>:3
> >> >> >>96)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.ja
> >>>>>>va
> >> >>>>:5
> >> >> >>55)
> >> >> >>         at
> >> >> >>
> >>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
> >> >> >>         at
> >> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
> >> >> >>         at
> >> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
> >> >> >>         at java.security.AccessController.doPrivileged(Native
> >>Method)
> >> >> >>         at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> >>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
> >>>>>>ti
> >> >>>>on
> >> >> >>.java:1671)
> >> >> >>         at
> >> >> >>
> >>
> >>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720)
> >> >> >>         ... 15 more
> >> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO
> >> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
> >> >> >>
> >> >>
> >> >>
> >>
> >>
>
>

Reply via email to