+1 (non-binding) pending the issue that Sunil/Rohith pointed out - Verified all hashes and checksums - Built from source on macOS 10.12.6, Java 1.8.0u65 - Deployed a pseudo cluster - Ran some example jobs
Thanks, Eric On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com> wrote: > Sunil / Rohith, > > Could you check if your configs are same as Jonathan posted configs? > https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693& > page=com.atlassian.jira.plugin.system.issuetabpanels: > comment-tabpanel#comment-16242693 > > And could you try if using Jonathan's configs can still reproduce the > issue? > > Thanks, > Wangda > > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org> wrote: > > > Thanks for testing Rohith and Sunil > > > > Can you please confirm if it is not a config issue at your end ? > > We (both Jonathan and myself) just tried testing this on a fresh cluster > > (both automatic and manual) and we are not able to reproduce this. I've > > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> > > JIRA > > with details of testing. > > > > Cheers > > -Arun/Subru > > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S < > > rohithsharm...@apache.org > > > wrote: > > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453 > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this > > > issue. > > > > > > - Rohith Sharma K S > > > > > > On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote: > > > > > >> Hi Subru and Arun. > > >> > > >> Thanks for driving 2.9 release. Great work! > > >> > > >> I installed cluster built from source. > > >> - Ran few MR jobs with application priority enabled. Runs fine. > > >> - Accessed new UI and it also seems fine. > > >> > > >> However I am also getting same issue as Rohith reported. > > >> - Started an HA cluster > > >> - Pushed RM to standby > > >> - Pushed back RM to active then seeing an exception. > > >> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition > to > > >> Active > > >> at > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > > >> lectorBasedElectorServic > > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > > >> at > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ > > >> eStandbyElector.java:894 > > >> ) > > >> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException: > > >> KeeperErrorCode = NoAuth > > >> at > > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > > >> at org.apache.zookeeper.ZooKeeper.multiInternal( > ZooKeeper.java: > > >> 949) > > >> > > >> Will check and post more details, > > >> > > >> - Sunil > > >> > > >> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S < > > >> rohithsharm...@apache.org> > > >> wrote: > > >> > > >> > Thanks Subru/Arun for the great work! > > >> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured > > cluster > > >> > along with new YARN UI and ATSv2. > > >> > > > >> > I am facing basic RM HA switch issue after first time successful > > start. > > >> > *Can > > >> > anyone else is facing this issue?* > > >> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never > switch > > to > > >> > active successfully. Exception trace I see from the log is > > >> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha. > > ActiveStandbyElector: > > >> > Exception handling the winning of election > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not > transition > > to > > >> > Active > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec > > >> torBasedElectorService.java:146) > > >> > at > > >> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ > > >> eStandbyElector.java:894) > > >> > at > > >> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti > > >> veStandbyElector.java:473) > > >> > at > > >> > > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent( > > >> ClientCnxn.java:599) > > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn. > > >> java:498) > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > > >> > transitioning to Active mode > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t > > >> ransitionToActive(AdminService.java:325) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec > > >> torBasedElectorService.java:144) > > >> > ... 4 more > > >> > Caused by: org.apache.hadoop.service.ServiceStateException: > > >> > org.apache.zookeeper.KeeperException$NoAuthException: > > KeeperErrorCode = > > >> > NoAuth > > >> > at > > >> > > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv > > >> iceStateException.java:105) > > >> > at > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ > > >> ice.java:205) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > > >> r.startActiveServices(ResourceManager.java:1131) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > > >> r$1.run(ResourceManager.java:1171) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > > >> r$1.run(ResourceManager.java:1167) > > >> > at java.security.AccessController.doPrivileged(Native Method) > > >> > at javax.security.auth.Subject.doAs(Subject.java:422) > > >> > at > > >> > > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro > > >> upInformation.java:1886) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > > >> r.transitionToActive(ResourceManager.java:1167) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t > > >> ransitionToActive(AdminService.java:320) > > >> > ... 5 more > > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException: > > >> > KeeperErrorCode = NoAuth > > >> > at > > >> > org.apache.zookeeper.KeeperException.create( > KeeperException.java:113) > > >> > at org.apache.zookeeper.ZooKeeper.multiInternal( > > ZooKeeper.java:949) > > >> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > > >> > at > > >> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO > > >> peration(CuratorTransactionImpl.java:159) > > >> > at > > >> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc > > >> ess$200(CuratorTransactionImpl.java:44) > > >> > at > > >> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c > > >> all(CuratorTransactionImpl.java:129) > > >> > at > > >> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c > > >> all(CuratorTransactionImpl.java:125) > > >> > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java: > 107) > > >> > at > > >> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com > > >> mit(CuratorTransactionImpl.java:122) > > >> > at > > >> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact > > >> ion.commit(ZKCuratorManager.java:403) > > >> > at > > >> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData( > > >> ZKCuratorManager.java:372) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > > >> > at > > >> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754) > > >> > at > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ > > >> ice.java:194) > > >> > ... 13 more > > >> > > > >> > Thanks & Regards > > >> > Rohith Sharma K S > > >> > > > >> > On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org> > wrote: > > >> > > > >> > > Hi folks, > > >> > > > > >> > > Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 > > >> line > > >> > and > > >> > > will be the latest stable/production release for Apache Hadoop - > it > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 > > Bug > > >> > > fixes new fixed issues since 2.8.2 . > > >> > > > > >> > > More information about the 2.9.0 release plan can be found > > here: > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/ > > >> > > Roadmap#Roadmap-Version2.9 > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/ > > >> > > Roadmap#Roadmap-Version2.9>* > > >> > > > > >> > > New RC is available at: > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/ > > >> > > > > >> > > The RC tag in git is: release-2.9.0-RC0, and the latest > commit > > >> id > > >> > is: > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a > > >> > > > > >> > > The maven artifacts are available via repository.apache.org > > at: > > >> > > * > > >> > https://repository.apache.org/content/repositories/orgapache > > >> hadoop-1065/ > > >> > > < > > >> > https://repository.apache.org/content/repositories/orgapache > > >> hadoop-1065/ > > >> > > >* > > >> > > > > >> > > Please try the release and vote; the vote will run for the > > >> usual 5 > > >> > > days, ending on 11/10/2017 4pm PST time. > > >> > > > > >> > > Thanks, > > >> > > > > >> > > Arun/Subru > > >> > > > > >> > > > >> > > > > > > > > >