Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vino...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkarana...@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebad...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >> I've
> >>>>> updated the YARN-7453 <https://issues.apache.org/
> >> jira/browse/YARN-7453
> >>>>
> >>>>> JIRA
> >>>>> with details of testing.
> >>>>>
> >>>>> Cheers
> >>>>> -Arun/Subru
> >>>>>
> >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> >>>>> rohithsharm...@apache.org
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> >> this
> >>>>>> issue.
> >>>>>>
> >>>>>> - Rohith Sharma K S
> >>>>>>
> >>>>>> On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote:
> >>>>>>
> >>>>>>> Hi Subru and Arun.
> >>>>>>>
> >>>>>>> Thanks for driving 2.9 release. Great work!
> >>>>>>>
> >>>>>>> I installed cluster built from source.
> >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
> >>>>>>> - Accessed new UI and it also seems fine.
> >>>>>>>
> >>>>>>> However I am also getting same issue as Rohith reported.
> >>>>>>> - Started an HA cluster
> >>>>>>> - Pushed RM to standby
> >>>>>>> - Pushed back RM to active then seeing an exception.
> >>>>>>>
> >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>> transition
> >>>> to
> >>>>>>> Active
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorServic
> >>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
> >>> orService.java:146)
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894
> >>>>>>>    )
> >>>>>>>
> >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>        at
> >>>>>>> org.apache.zookeeper.KeeperException.create(
> >>> KeeperException.java:113)
> >>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>> ZooKeeper.java:
> >>>>>>> 949)
> >>>>>>>
> >>>>>>> Will check and post more details,
> >>>>>>>
> >>>>>>> - Sunil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >>>>>>> rohithsharm...@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Subru/Arun for the great work!
> >>>>>>>>
> >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
> >>>>> cluster
> >>>>>>>> along with new YARN UI and ATSv2.
> >>>>>>>>
> >>>>>>>> I am facing basic RM HA switch issue after first time successful
> >>>>> start.
> >>>>>>>> *Can
> >>>>>>>> anyone else is facing this issue?*
> >>>>>>>>
> >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> >>>> switch
> >>>>> to
> >>>>>>>> active successfully. Exception trace I see from the log is
> >>>>>>>>
> >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> >>>>> ActiveStandbyElector:
> >>>>>>>> Exception handling the winning of election
> >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>>> transition
> >>>>> to
> >>>>>>>> Active
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:146)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >>>>>>> veStandbyElector.java:473)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >>>>>>> ClientCnxn.java:599)
> >>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
> >>> ClientCnxn.
> >>>>>>> java:498)
> >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> >>> when
> >>>>>>>> transitioning to Active mode
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:325)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:144)
> >>>>>>>>    ... 4 more
> >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
> >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>> KeeperErrorCode =
> >>>>>>>> NoAuth
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
> >>>>>>> iceStateException.java:105)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:205)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.startActiveServices(ResourceManager.java:1131)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1171)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1167)
> >>>>>>>>    at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >>>>>>> upInformation.java:1886)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.transitionToActive(ResourceManager.java:1167)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:320)
> >>>>>>>>    ... 5 more
> >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
> >> NoAuthException:
> >>>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>>    at
> >>>>>>>> org.apache.zookeeper.KeeperException.create(
> >>>> KeeperException.java:113)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>>> ZooKeeper.java:949)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >>>>>>> peration(CuratorTransactionImpl.java:159)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >>>>>>> ess$200(CuratorTransactionImpl.java:44)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:129)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:125)
> >>>>>>>>    at org.apache.curator.RetryLoop.
> >> callWithRetry(RetryLoop.java:
> >>>> 107)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >>>>>>> mit(CuratorTransactionImpl.java:122)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >>>>>>> ion.commit(ZKCuratorManager.java:403)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >>>>>>> ZKCuratorManager.java:372)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:194)
> >>>>>>>>    ... 13 more
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Rohith Sharma K S
> >>>>>>>>
> >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
> >>> 2.9
> >>>>>>> line
> >>>>>>>> and
> >>>>>>>>> will be the latest stable/production release for Apache
> >> Hadoop -
> >>>> it
> >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
> >>> 787
> >>>>> Bug
> >>>>>>>>> fixes new fixed issues since 2.8.2 .
> >>>>>>>>>
> >>>>>>>>>      More information about the 2.9.0 release plan can be
> >> found
> >>>>> here:
> >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9>*
> >>>>>>>>>
> >>>>>>>>>      New RC is available at:
> >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>>>>>>>>
> >>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
> >>>> commit
> >>>>>>> id
> >>>>>>>> is:
> >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>>>>>>>>
> >>>>>>>>>      The maven artifacts are available via
> >>> repository.apache.org
> >>>>> at:
> >>>>>>>>> *
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>> <
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>>>>      Please try the release and vote; the vote will run for
> >> the
> >>>>>>> usual 5
> >>>>>>>>> days, ending on 11/10/2017 4pm PST time.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Arun/Subru
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Reply via email to