Thanks for testing Rohith and Sunil Can you please confirm if it is not a config issue at your end ? We (both Jonathan and myself) just tried testing this on a fresh cluster (both automatic and manual) and we are not able to reproduce this. I've updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> JIRA with details of testing.
Cheers -Arun/Subru On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <rohithsharm...@apache.org > wrote: > Thanks Sunil for confirmation. Btw, I have raised YARN-7453 > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this > issue. > > - Rohith Sharma K S > > On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote: > >> Hi Subru and Arun. >> >> Thanks for driving 2.9 release. Great work! >> >> I installed cluster built from source. >> - Ran few MR jobs with application priority enabled. Runs fine. >> - Accessed new UI and it also seems fine. >> >> However I am also getting same issue as Rohith reported. >> - Started an HA cluster >> - Pushed RM to standby >> - Pushed back RM to active then seeing an exception. >> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to >> Active >> at >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> lectorBasedElectorServic >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) >> at >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ >> eStandbyElector.java:894 >> ) >> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException: >> KeeperErrorCode = NoAuth >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113) >> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java: >> 949) >> >> Will check and post more details, >> >> - Sunil >> >> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S < >> rohithsharm...@apache.org> >> wrote: >> >> > Thanks Subru/Arun for the great work! >> > >> > Downloaded source and built from it. Deployed RM HA non-secured cluster >> > along with new YARN UI and ATSv2. >> > >> > I am facing basic RM HA switch issue after first time successful start. >> > *Can >> > anyone else is facing this issue?* >> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to >> > active successfully. Exception trace I see from the log is >> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector: >> > Exception handling the winning of election >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to >> > Active >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> lectorBasedElectorService.becomeActive(ActiveStandbyElec >> torBasedElectorService.java:146) >> > at >> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ >> eStandbyElector.java:894) >> > at >> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti >> veStandbyElector.java:473) >> > at >> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent( >> ClientCnxn.java:599) >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn. >> java:498) >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when >> > transitioning to Active mode >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t >> ransitionToActive(AdminService.java:325) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> lectorBasedElectorService.becomeActive(ActiveStandbyElec >> torBasedElectorService.java:144) >> > ... 4 more >> > Caused by: org.apache.hadoop.service.ServiceStateException: >> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = >> > NoAuth >> > at >> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv >> iceStateException.java:105) >> > at >> > org.apache.hadoop.service.AbstractService.start(AbstractServ >> ice.java:205) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r.startActiveServices(ResourceManager.java:1131) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r$1.run(ResourceManager.java:1171) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r$1.run(ResourceManager.java:1167) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:422) >> > at >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1886) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r.transitionToActive(ResourceManager.java:1167) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t >> ransitionToActive(AdminService.java:320) >> > ... 5 more >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException: >> > KeeperErrorCode = NoAuth >> > at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113) >> > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) >> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) >> > at >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO >> peration(CuratorTransactionImpl.java:159) >> > at >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc >> ess$200(CuratorTransactionImpl.java:44) >> > at >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c >> all(CuratorTransactionImpl.java:129) >> > at >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c >> all(CuratorTransactionImpl.java:125) >> > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) >> > at >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com >> mit(CuratorTransactionImpl.java:122) >> > at >> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact >> ion.commit(ZKCuratorManager.java:403) >> > at >> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData( >> ZKCuratorManager.java:372) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) >> > at >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r$RMActiveServices.serviceStart(ResourceManager.java:754) >> > at >> > org.apache.hadoop.service.AbstractService.start(AbstractServ >> ice.java:194) >> > ... 13 more >> > >> > Thanks & Regards >> > Rohith Sharma K S >> > >> > On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org> wrote: >> > >> > > Hi folks, >> > > >> > > Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 >> line >> > and >> > > will be the latest stable/production release for Apache Hadoop - it >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug >> > > fixes new fixed issues since 2.8.2 . >> > > >> > > More information about the 2.9.0 release plan can be found here: >> > > *https://cwiki.apache.org/confluence/display/HADOOP/ >> > > Roadmap#Roadmap-Version2.9 >> > > <https://cwiki.apache.org/confluence/display/HADOOP/ >> > > Roadmap#Roadmap-Version2.9>* >> > > >> > > New RC is available at: >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/ >> > > >> > > The RC tag in git is: release-2.9.0-RC0, and the latest commit >> id >> > is: >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a >> > > >> > > The maven artifacts are available via repository.apache.org at: >> > > * >> > https://repository.apache.org/content/repositories/orgapache >> hadoop-1065/ >> > > < >> > https://repository.apache.org/content/repositories/orgapache >> hadoop-1065/ >> > > >* >> > > >> > > Please try the release and vote; the vote will run for the >> usual 5 >> > > days, ending on 11/10/2017 4pm PST time. >> > > >> > > Thanks, >> > > >> > > Arun/Subru >> > > >> > >> > >