+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com> wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&;
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org> wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharm...@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >>         at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >>         at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >>     )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharm...@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> >     at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> >     ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > >> iceStateException.java:105)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:205)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.startActiveServices(ResourceManager.java:1131)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1171)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1167)
> > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > >> upInformation.java:1886)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.transitionToActive(ResourceManager.java:1167)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:320)
> > >> >     ... 5 more
> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> > KeeperErrorCode = NoAuth
> > >> >     at
> > >> > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:949)
> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > >> peration(CuratorTransactionImpl.java:159)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > >> ess$200(CuratorTransactionImpl.java:44)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:129)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:125)
> > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> 107)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > >> mit(CuratorTransactionImpl.java:122)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > >> ion.commit(ZKCuratorManager.java:403)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > >> ZKCuratorManager.java:372)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:194)
> > >> >     ... 13 more
> > >> >
> > >> > Thanks & Regards
> > >> > Rohith Sharma K S
> > >> >
> > >> > On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org>
> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> > >> line
> > >> > and
> > >> > > will be the latest stable/production release for Apache Hadoop -
> it
> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> > Bug
> > >> > > fixes new fixed issues since 2.8.2 .
> > >> > >
> > >> > >       More information about the 2.9.0 release plan can be found
> > here:
> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9
> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9>*
> > >> > >
> > >> > >       New RC is available at:
> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >> > >
> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> commit
> > >> id
> > >> > is:
> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >> > >
> > >> > >       The maven artifacts are available via repository.apache.org
> > at:
> > >> > > *
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > <
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > >*
> > >> > >
> > >> > >       Please try the release and vote; the vote will run for the
> > >> usual 5
> > >> > > days, ending on 11/10/2017 4pm PST time.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Arun/Subru
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to