A few thoughts: 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a prerequisite for HDFS-8791. Without that patch, upgrades can be very slow for data nodes depending on your setup.
2. We have already deployed this patch internally so, with my Twitter hat on, I would be perfectly happy as long as it makes it into trunk and 2.8. That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x releases on a large production cluster that has a diverse set of block ids without this patch, especially if your data nodes have a large number of disks or you are using federation. To be clear though: this highly depends on your setup and at a minimum you should verify that this regression will not affect you. The current block-id based layout in 2.6.x and 2.7.2 has a performance regression that gets worse over time. When you see it happening on a live cluster, it is one of the harder issues to identify a root cause and debug. I do understand that this is currently only affecting a smaller number of users, but I also think this number has potential to increase as time goes on. Maybe we can issue a warning in the release notes for future 2.7.x and 2.6.x releases? 3. One option (this was suggested on HDFS-8791 and I think Sean alluded to this proposal on this thread) would be to cut a 2.8 release off of the 2.7.3 release with the new layout. What people currently think of as 2.8 would then become 2.9. This would give customers a stable release that they could deploy with the new layout and would not break upgrade and downgrade expectations. On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurt...@apache.org> wrote: > As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we would > patch the release to revert HDFS-8791 before pushing it out to production. > For what it's worth. > > > On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > > > One other thing I wanted to bring up regarding HDFS-8791, we haven't > > backported the parallel DN upgrade improvement (HDFS-8578) to branch-2.6. > > HDFS-8578 is a very important related fix since otherwise upgrade will be > > very slow. > > > > On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang <andrew.w...@cloudera.com> > > wrote: > > > > > As I expressed on HDFS-8791, I do not want to include this JIRA in a > > > maintenance release. I've only seen it crop up on a handful of our > > > customer's clusters, and large users like Twitter and Yahoo that seem > to > > be > > > more affected are also the most able to patch this change in > themselves. > > > > > > Layout upgrades are quite disruptive, and I don't think it's worth > > > breaking upgrade and downgrade expectations when it doesn't affect the > > (in > > > my experience) vast majority of users. > > > > > > Vinod seemed to have a similar opinion in his comment on HDFS-8791, but > > > will let him elaborate. > > > > > > Best, > > > Andrew > > > > > > On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <bus...@cloudera.com> > > wrote: > > > > > >> As of 2 days ago, there were already 135 jiras associated with 2.7.3, > > >> if *any* of them end up introducing a regression the inclusion of > > >> HDFS-8791 means that folks will have cluster downtime in order to back > > >> things out. If that happens to any substantial number of downstream > > >> folks, or any particularly vocal downstream folks, then it is very > > >> likely we'll lose the remaining trust of operators for rolling out > > >> maintenance releases. That's a pretty steep cost. > > >> > > >> Please do not include HDFS-8791 in any 2.6.z release. Folks having to > > >> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an > > >> unreasonable burden. > > >> > > >> I agree that this fix is important, I just think we should either cut > > >> a version of 2.8 that includes it or find a way to do it that gives an > > >> operational path for rolling downgrade. > > >> > > >> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <j...@hortonworks.com> > > wrote: > > >> > Thanks for bringing up this topic, Sean. > > >> > When I released our latest Hadoop release 2.6.4, the patch of > > HDFS-8791 > > >> haven't been committed in so that's why we didn't discuss this > earlier. > > >> > I remember in JIRA discussion, we treated this layout change as a > > >> Blocker bug that fixing a significant performance regression before > but > > not > > >> a normal performance improvement. And I believe HDFS community already > > did > > >> their best with careful and patient to deliver the fix and other > related > > >> patches (like upgrade fix in HDFS-8578). Take an example of HDFS-8578, > > you > > >> can see 30+ rounds patch review back and forth by senior committers, > > not to > > >> mention the outstanding performance test data in HDFS-8791. > > >> > I would trust our HDFS committers' judgement to land HDFS-8791 on > > >> 2.7.3. However, that needs Vinod's final confirmation who serves as RM > > for > > >> branch-2.7. In addition, I didn't see any blocker issue to bring it > into > > >> 2.6.5 now. > > >> > Just my 2 cents. > > >> > > > >> > Thanks, > > >> > > > >> > Junping > > >> > > > >> > ________________________________________ > > >> > From: Sean Busbey <bus...@cloudera.com> > > >> > Sent: Thursday, March 31, 2016 2:57 PM > > >> > To: hdfs-...@hadoop.apache.org > > >> > Cc: Hadoop Common; yarn-...@hadoop.apache.org; > > >> mapreduce-...@hadoop.apache.org > > >> > Subject: Re: 2.7.3 release plan > > >> > > > >> > A layout change in a maintenance release sounds very risky. I saw > some > > >> > discussion on the JIRA about those risks, but the consensus seemed > to > > >> > be "we'll leave it up to the 2.6 and 2.7 release managers." I > thought > > >> > we did RMs per release rather than per branch? No one claiming to > be a > > >> > release manager ever spoke up AFAICT. > > >> > > > >> > Should this change be included? Should it go into a special 2.8 > > >> > release as mentioned in the ticket? > > >> > > > >> > On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA > > >> > <ajisa...@oss.nttdata.co.jp> wrote: > > >> >> Thank you Vinod! > > >> >> > > >> >> FYI: 2.7.3 will be a bit special release. > > >> >> > > >> >> HDFS-8791 bumped up the datanode layout version, > > >> >> so rolling downgrade from 2.7.3 to 2.7.[0-2] > > >> >> is impossible. We can rollback instead. > > >> >> > > >> >> https://issues.apache.org/jira/browse/HDFS-8791 > > >> >> > > >> > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html > > >> >> > > >> >> Regards, > > >> >> Akira > > >> >> > > >> >> > > >> >> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote: > > >> >>> > > >> >>> Hi all, > > >> >>> > > >> >>> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to go out > > >> (which > > >> >>> did go out mid February). Got a little busy since. > > >> >>> > > >> >>> Following up the 2.7.2 maintenance release, we should work > towards a > > >> >>> 2.7.3. The focus obviously is to have blocker issues [1], > bug-fixes > > >> and *no* > > >> >>> features / improvements. > > >> >>> > > >> >>> I hope to cut an RC in a week - giving enough time for outstanding > > >> blocker > > >> >>> / critical issues. Will start moving out any tickets that are not > > >> blockers > > >> >>> and/or won’t fit the timeline - there are 3 blockers and 15 > critical > > >> tickets > > >> >>> outstanding as of now. > > >> >>> > > >> >>> Thanks, > > >> >>> +Vinod > > >> >>> > > >> >>> [1] 2.7.3 release blockers: > > >> >>> https://issues.apache.org/jira/issues/?filter=12335343 > > >> >>> > > >> >> > > >> > > > >> > > > >> > > > >> > -- > > >> > busbey > > >> > > >> > > >> > > >> -- > > >> busbey > > >> > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >