Thanks Vinod for your feedback, we'll incorporate it when we spin RC1. -Subru/Arun
On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vino...@apache.org> wrote: > A related point - I thought I mentioned this in one of the release > preparation threads, but in any case. > > Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to > the voting thread as well as the final release) that the first release can > potentially go through additional fixes to incompatible changes (besides > stabilization fixes). We should do this with 2.9.0 too. > > This has some history - long before this, we tried two different things: > (a) downstream projects consume an RC (b) downstream projects consume a > release. Option (a) was tried many times but it was increasingly getting > hard to manage this across all the projects that depend on Hadoop. When we > tried option (b), we used to make .0 as a GA release, but downstream > projects like Tez, Hive, Spark would come back and find an incompatible > change - and now we were forced into a conundrum - is fixing this > incompatible change itself an incompatibility? So to avoid this problem, > we've started marking the first few releases as alpha eventually making a > stable point release. Clearly, specific users can still use this in > production as long as we the Hadoop community reserve the right to fix > incompatibilities. > > Long story short, I'd just add to your voting thread and release notes > that 2.9.0 still needs to be tested downstream and so users may want to > wait for subsequent point releases. > > Thanks > +Vinod > > > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote: > > > > We are canceling the RC due to the issue that Rohith/Sunil identified. > The > > issue was difficult to track down as it only happens when you use IP for > ZK > > (works fine with host names) and moreover if ZK and RM are co-located on > > same machine. We are hopeful to get the fix in tomorrow and roll out RC1. > > > > Thanks to everyone for the extensive testing/validation. Hopefully cost > to > > replicate with RC1 is much lower. > > > > -Subru/Arun. > > > > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos < > kkarana...@gmail.com > >> wrote: > > > >> +1 from me too. > >> > >> Did the following: > >> 1) set up a 9-node cluster; > >> 2) ran some Gridmix jobs; > >> 3) ran (2) after enabling opportunistic containers (used a mix of > >> guaranteed and opportunistic containers for each job); > >> 4) ran (3) but this time enabling distributed scheduling of > opportunistic > >> containers. > >> > >> All the above worked with no issues. > >> > >> Thanks for all the effort guys! > >> > >> Konstantinos > >> > >> > >> > >> Konstantinos > >> > >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebad...@oath.com.invalid> > >> wrote: > >> > >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out > >>> > >>> - Verified all hashes and checksums > >>> - Built from source on macOS 10.12.6, Java 1.8.0u65 > >>> - Deployed a pseudo cluster > >>> - Ran some example jobs > >>> > >>> Thanks, > >>> > >>> Eric > >>> > >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com> > wrote: > >>> > >>>> Sunil / Rohith, > >>>> > >>>> Could you check if your configs are same as Jonathan posted configs? > >>>> https://issues.apache.org/jira/browse/YARN-7453? > >>> focusedCommentId=16242693& > >>>> page=com.atlassian.jira.plugin.system.issuetabpanels: > >>>> comment-tabpanel#comment-16242693 > >>>> > >>>> And could you try if using Jonathan's configs can still reproduce the > >>>> issue? > >>>> > >>>> Thanks, > >>>> Wangda > >>>> > >>>> > >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org> > >> wrote: > >>>> > >>>>> Thanks for testing Rohith and Sunil > >>>>> > >>>>> Can you please confirm if it is not a config issue at your end ? > >>>>> We (both Jonathan and myself) just tried testing this on a fresh > >>> cluster > >>>>> (both automatic and manual) and we are not able to reproduce this. > >> I've > >>>>> updated the YARN-7453 <https://issues.apache.org/ > >> jira/browse/YARN-7453 > >>>> > >>>>> JIRA > >>>>> with details of testing. > >>>>> > >>>>> Cheers > >>>>> -Arun/Subru > >>>>> > >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S < > >>>>> rohithsharm...@apache.org > >>>>>> wrote: > >>>>> > >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453 > >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track > >> this > >>>>>> issue. > >>>>>> > >>>>>> - Rohith Sharma K S > >>>>>> > >>>>>> On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote: > >>>>>> > >>>>>>> Hi Subru and Arun. > >>>>>>> > >>>>>>> Thanks for driving 2.9 release. Great work! > >>>>>>> > >>>>>>> I installed cluster built from source. > >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine. > >>>>>>> - Accessed new UI and it also seems fine. > >>>>>>> > >>>>>>> However I am also getting same issue as Rohith reported. > >>>>>>> - Started an HA cluster > >>>>>>> - Pushed RM to standby > >>>>>>> - Pushed back RM to active then seeing an exception. > >>>>>>> > >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not > >>> transition > >>>> to > >>>>>>> Active > >>>>>>> at > >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > >>>>>>> lectorBasedElectorServic > >>>>>>> e.becomeActive(ActiveStandbyElectorBasedElect > >>> orService.java:146) > >>>>>>> at > >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ > >>>>>>> eStandbyElector.java:894 > >>>>>>> ) > >>>>>>> > >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException: > >>>>>>> KeeperErrorCode = NoAuth > >>>>>>> at > >>>>>>> org.apache.zookeeper.KeeperException.create( > >>> KeeperException.java:113) > >>>>>>> at org.apache.zookeeper.ZooKeeper.multiInternal( > >>>> ZooKeeper.java: > >>>>>>> 949) > >>>>>>> > >>>>>>> Will check and post more details, > >>>>>>> > >>>>>>> - Sunil > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S < > >>>>>>> rohithsharm...@apache.org> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks Subru/Arun for the great work! > >>>>>>>> > >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured > >>>>> cluster > >>>>>>>> along with new YARN UI and ATSv2. > >>>>>>>> > >>>>>>>> I am facing basic RM HA switch issue after first time successful > >>>>> start. > >>>>>>>> *Can > >>>>>>>> anyone else is facing this issue?* > >>>>>>>> > >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never > >>>> switch > >>>>> to > >>>>>>>> active successfully. Exception trace I see from the log is > >>>>>>>> > >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha. > >>>>> ActiveStandbyElector: > >>>>>>>> Exception handling the winning of election > >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not > >>>> transition > >>>>> to > >>>>>>>> Active > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec > >>>>>>> torBasedElectorService.java:146) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ > >>>>>>> eStandbyElector.java:894) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti > >>>>>>> veStandbyElector.java:473) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent( > >>>>>>> ClientCnxn.java:599) > >>>>>>>> at org.apache.zookeeper.ClientCnxn$EventThread.run( > >>> ClientCnxn. > >>>>>>> java:498) > >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error > >>> when > >>>>>>>> transitioning to Active mode > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t > >>>>>>> ransitionToActive(AdminService.java:325) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE > >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec > >>>>>>> torBasedElectorService.java:144) > >>>>>>>> ... 4 more > >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException: > >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException: > >>>>> KeeperErrorCode = > >>>>>>>> NoAuth > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv > >>>>>>> iceStateException.java:105) > >>>>>>>> at > >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ > >>>>>>> ice.java:205) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > >>>>>>> r.startActiveServices(ResourceManager.java:1131) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > >>>>>>> r$1.run(ResourceManager.java:1171) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > >>>>>>> r$1.run(ResourceManager.java:1167) > >>>>>>>> at java.security.AccessController.doPrivileged(Native > >> Method) > >>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro > >>>>>>> upInformation.java:1886) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > >>>>>>> r.transitionToActive(ResourceManager.java:1167) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t > >>>>>>> ransitionToActive(AdminService.java:320) > >>>>>>>> ... 5 more > >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$ > >> NoAuthException: > >>>>>>>> KeeperErrorCode = NoAuth > >>>>>>>> at > >>>>>>>> org.apache.zookeeper.KeeperException.create( > >>>> KeeperException.java:113) > >>>>>>>> at org.apache.zookeeper.ZooKeeper.multiInternal( > >>>>> ZooKeeper.java:949) > >>>>>>>> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO > >>>>>>> peration(CuratorTransactionImpl.java:159) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc > >>>>>>> ess$200(CuratorTransactionImpl.java:44) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c > >>>>>>> all(CuratorTransactionImpl.java:129) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c > >>>>>>> all(CuratorTransactionImpl.java:125) > >>>>>>>> at org.apache.curator.RetryLoop. > >> callWithRetry(RetryLoop.java: > >>>> 107) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com > >>>>>>> mit(CuratorTransactionImpl.java:122) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact > >>>>>>> ion.commit(ZKCuratorManager.java:403) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData( > >>>>>>> ZKCuratorManager.java:372) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS > >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > >>>>>>>> at > >>>>>>>> > >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage > >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754) > >>>>>>>> at > >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ > >>>>>>> ice.java:194) > >>>>>>>> ... 13 more > >>>>>>>> > >>>>>>>> Thanks & Regards > >>>>>>>> Rohith Sharma K S > >>>>>>>> > >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org> > >>>> wrote: > >>>>>>>> > >>>>>>>>> Hi folks, > >>>>>>>>> > >>>>>>>>> Apache Hadoop 2.9.0 is the first stable release of Hadoop > >>> 2.9 > >>>>>>> line > >>>>>>>> and > >>>>>>>>> will be the latest stable/production release for Apache > >> Hadoop - > >>>> it > >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements, > >>> 787 > >>>>> Bug > >>>>>>>>> fixes new fixed issues since 2.8.2 . > >>>>>>>>> > >>>>>>>>> More information about the 2.9.0 release plan can be > >> found > >>>>> here: > >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/ > >>>>>>>>> Roadmap#Roadmap-Version2.9 > >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/ > >>>>>>>>> Roadmap#Roadmap-Version2.9>* > >>>>>>>>> > >>>>>>>>> New RC is available at: > >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/ > >>>>>>>>> > >>>>>>>>> The RC tag in git is: release-2.9.0-RC0, and the latest > >>>> commit > >>>>>>> id > >>>>>>>> is: > >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a > >>>>>>>>> > >>>>>>>>> The maven artifacts are available via > >>> repository.apache.org > >>>>> at: > >>>>>>>>> * > >>>>>>>> https://repository.apache.org/content/repositories/orgapache > >>>>>>> hadoop-1065/ > >>>>>>>>> < > >>>>>>>> https://repository.apache.org/content/repositories/orgapache > >>>>>>> hadoop-1065/ > >>>>>>>>>> * > >>>>>>>>> > >>>>>>>>> Please try the release and vote; the vote will run for > >> the > >>>>>>> usual 5 > >>>>>>>>> days, ending on 11/10/2017 4pm PST time. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Arun/Subru > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >