Re: Large feature development
Hey Arun, First, let me apologize if my email came off as a personal "snipe" against the project or anyone working on it. I know the team has been hard at work for multiple years now on the project, and I certainly don't mean to denigrate the work anyone has done. I also agree that the improvements made possible by YARN are tremendously important, and I've expressed this opinion both online and in interviews with analysts, etc. But, I'll stand by my point that YARN is at this point more "alpha" than HDFS2. You brought up two bugs in the HDFS2 code base as examples of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a messy bug, but had nothing to do with HA, the edit log rewrite, or any other of the changes being discussed in the thread. In fact, the bug has been there since the "beginning of time", and is in fact present in Hadoop 1.0.x as well (which is why the JIRA is still open). You simply need to pass a non-canonicalized path by the Path(URI) constructor, and you'll see the same behavior in every release including 1.0.x, 0.20.x, or earlier. The reason it shows up more often in Hadoop 2 was actually due to the FsShell rewrite -- not any changes in HDFS itself, and certainly not related to HA like you've implied here. The other bug causes blocksBeingWritten to disappear upon upgrade. This, also, had nothing to do with any of the features being discussed in this thread, and in fact only impacts a cluster which is taken down _uncleanly_ prior to an upgrade. Upon starting the upgraded cluster, the user would be alerted to the missing blocks and could rollback with no lost data. So, while it should be fixed (and has been), I wouldn't consider it particularly frightening. Most users I am aware of do a "clean" shutdown of services like HBase before trying to upgrade their cluster, and, worst case, they would see the issue immediately after the upgrade and perform a rollback with no adverse effects. In branch-1, however, I've seen other bugs that I'd consider much more scary. Two in particular come to mind and together represent the vast majority of cases in which we've seen customers experience data corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1 only, and never present in Hadoop 2 due to the "edit log rewrite" project (HDFS-1073). So, at risk of this thread just becoming a laundry list of bugs that have existed in HDFS, or a list of bugs in YARN, I'll summarize: I still think that YARN is "alpha" and HDFS 2 is at least as "stable" as Hadoop 1.0. We have customers running it for production workloads, in multi-rack clusters, with great success. But this has nothing to do with this thread at hand, so I'll raise the question of alpha/beta/stable labeling in the context of our next release vote, and hope we can go back to the more fruitful discussion of how to encourage large feature development while maintaining stability. Thanks -Todd On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy wrote: > Eli, > > On Sep 2, 2012, at 1:01 PM, Eli Collins wrote: > >> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy wrote: >>> Todd, >>> >>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote: >>> I'd actually contend that YARN was merged too early. I have yet to see anyone running YARN in production, and it's holding up the "Stable" moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and I'm seeing fewer issues in our customers running Hadoop HDFS 2 compared to Hadoop 1-derived code. >>> >>> You know I respect you a ton, but I'm very saddened to see you perpetuate >>> this FUD on our public lists. I expected better, particularly when everyone >>> is working towards the same goals of advancing Hadoop-2. This sniping on >>> other members doing work is, um, I'll just stop here rather than regret >>> later. >> 2. HDFS is more mature than YARN. Not a surprise given that we all >> agree YARN is alpha, and a much newer project than HDFS that hasn't >> yet been deployed in production environments yet (to my knowledge). > > Let's focus on the ground reality here. > > Please read my (or Rajiv's) message again about YARN's current > stability and how much it's baked, it's deployment plans to a very > large cluster in a few *days*. Or, talk to the people developing, > testing and supporting these customers and clusters. > > I'll repeat - YARN has clearly baked much more than HDFS HA given > the basic bugs (upgrade, edit logs corruption etc.) we've seen after > being declared *done*; but then we just disagree since clearly I'm > more conservative. Also, we need to be more conservative wrt HDFS - > but then what would I know... > > I'll admit it's hard to discuss with someone (or a collective) who > just repeat themselves. Plus, I broke my own rule about email this > weekend - so, I'll try harder. > > Arun -- Todd Lipcon Software Engineer, Cloudera
Re: Large feature development
Eli, On Sep 2, 2012, at 1:01 PM, Eli Collins wrote: > On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy wrote: >> Todd, >> >> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote: >> >>> I'd actually contend that YARN was merged too early. I have yet to see >>> anyone running YARN in production, and it's holding up the "Stable" >>> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and >>> I'm seeing fewer issues in our customers running Hadoop HDFS 2 >>> compared to Hadoop 1-derived code. >> >> You know I respect you a ton, but I'm very saddened to see you perpetuate >> this FUD on our public lists. I expected better, particularly when everyone >> is working towards the same goals of advancing Hadoop-2. This sniping on >> other members doing work is, um, I'll just stop here rather than regret >> later. > 2. HDFS is more mature than YARN. Not a surprise given that we all > agree YARN is alpha, and a much newer project than HDFS that hasn't > yet been deployed in production environments yet (to my knowledge). Let's focus on the ground reality here. Please read my (or Rajiv's) message again about YARN's current stability and how much it's baked, it's deployment plans to a very large cluster in a few *days*. Or, talk to the people developing, testing and supporting these customers and clusters. I'll repeat - YARN has clearly baked much more than HDFS HA given the basic bugs (upgrade, edit logs corruption etc.) we've seen after being declared *done*; but then we just disagree since clearly I'm more conservative. Also, we need to be more conservative wrt HDFS - but then what would I know... I'll admit it's hard to discuss with someone (or a collective) who just repeat themselves. Plus, I broke my own rule about email this weekend - so, I'll try harder. Arun
Re: Large feature development
On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy wrote: > Todd, > > On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote: > >> I'd actually contend that YARN was merged too early. I have yet to see >> anyone running YARN in production, and it's holding up the "Stable" >> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and >> I'm seeing fewer issues in our customers running Hadoop HDFS 2 >> compared to Hadoop 1-derived code. > > You know I respect you a ton, but I'm very saddened to see you perpetuate > this FUD on our public lists. I expected better, particularly when everyone > is working towards the same goals of advancing Hadoop-2. This sniping on > other members doing work is, um, I'll just stop here rather than regret later. Todd is just saying that: 1. HDFS v2 has fewer critical bugs than v1 (mostly thanks to the edit log rewrite, which aside from HA was motivated by all the quality issues the v1 code has had) 2. HDFS is more mature than YARN. Not a surprise given that we all agree YARN is alpha, and a much newer project than HDFS that hasn't yet been deployed in production environments yet (to my knowledge). I don't read this as a snipe against anyone coding on Hadoop, it's just that the two sub-projects are at different stages in their life and development. Thanks, Eli
Re: Large feature development
On Sun, Sep 2, 2012 at 7:58 AM, Steve Loughran wrote: > On 1 September 2012 09:20, Todd Lipcon wrote: > >> Thanks for starting this thread, Steve. I think your points below are >> good. I've snipped most of your comment and will reply inline to one >> bit below: >> >> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran >> wrote: >> >> >> > >> > How then do we get (a) more dev projects working and integrated by the >> > current committers, and (b) a process in which people who are not yet >> > contributors/committers can develop non-trivial changes to the project >> in a >> > way that it is done with the knowledge, support and mentorship of the >> rest >> > of the community? >> >> > Both HDFS2 and MRv2 are in trunk, therefore I consider them successes. > > >> Here's one proposal, making use of git as an easy way to allow >> non-committers to "commit" code while still tracking development in >> the usual places: >> > > This is effectively what people do. I'm less worried about the code side of > things than the integration and mentoring > > >> - Upon anyone's request, we create a new "Version" tag in JIRA. >> > > -1. There are enough versions. There is a "tag" field in JIRA for precisely > this purpose > > >> - The developers create an umbrella JIRA for the project, and file the >> individual work items as subtasks (either up front, or as they are >> developed if using a more iterative model) >> > > as today > > >> - On the umbrella, they add a pointer to a git branch to be used as >> the staging area for the branch. As they develop each subtask, they >> can use the JIRA to discuss the development like they would with a >> normally committed JIRA, but when they feel it is ready to go (not >> requiring a +1 from any committer) they commit to their git branch >> instead of the SVN repo. >> > > some integration w/ jenkins and pull testing would be good here > > >> - When the branch is ready to merge, they can call a merge vote, which >> requires +1 from 3 committers, same as a branch being proposed by an >> existing committer. A committer would then use git-svn to merge their >> branch commit-by-commit, or if it is less extensive, simply generate a >> single big patch to commit into SVN. >> >> My thinking is that this would provide a low-friction way for people >> to collaborate with the community and develop in the open, without >> having to work closely with any committer to review every individual >> subtask. >> >> Another alternative, if people are reluctant to use git, would be to >> add a "sandbox/" repository inside our SVN, and hand out commit bit to >> branches inside there without any PMC vote. Anyone interested in >> contributing could request a branch in the sandbox, and be granted >> access as soon as they get an apache SVN account. >> >> > I don't see the technical issues with how the merge is done as the main > problem. > > The barriers to getting your stuff in are > 1. getting people to care enough to help develop the feature -mentorship, > collaborative development. > 2. getting incremental parts in to avoid the continual > merge-regression-test hell that you go through if you are trying to keep a > separate branch alive. It's not the technical aspects of the merge so much > as the need to run all the hadoop tests and your own test suite, and track > down whether a failure is a regression in -trunk or something in your code. > > Jun's patch is an example of this situation. We haven't seen the effort he > and his colleagues have done with merge and test, but I'm confident it's > been there. What they now have is a "big bang" class of patch which is so > big that anyone reviewing it would have to spend a couple of weeks going > through the codebase trying to understand it. Which as we all know means > two weeks not doing all the things you are committed to doing. > > We know it's there, we know it's current -so how to use this as an exercise > in something to pull in incrementally? Jun's patches from HADOOP-8468 (which were developed on a private github repo) are being pulled in incrementally into trunk, there's no feature branch (which I think would have been a better route but at least the current approach has not prevented some progress). All the recent examples of features that I can think of that have been developed upstream first at Apache on feature branches have gone well. Thanks, Eli
Re: Large feature development
On 1 September 2012 09:20, Todd Lipcon wrote: > Thanks for starting this thread, Steve. I think your points below are > good. I've snipped most of your comment and will reply inline to one > bit below: > > On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran > wrote: > > > > > > How then do we get (a) more dev projects working and integrated by the > > current committers, and (b) a process in which people who are not yet > > contributors/committers can develop non-trivial changes to the project > in a > > way that it is done with the knowledge, support and mentorship of the > rest > > of the community? > > Both HDFS2 and MRv2 are in trunk, therefore I consider them successes. > Here's one proposal, making use of git as an easy way to allow > non-committers to "commit" code while still tracking development in > the usual places: > This is effectively what people do. I'm less worried about the code side of things than the integration and mentoring > - Upon anyone's request, we create a new "Version" tag in JIRA. > -1. There are enough versions. There is a "tag" field in JIRA for precisely this purpose > - The developers create an umbrella JIRA for the project, and file the > individual work items as subtasks (either up front, or as they are > developed if using a more iterative model) > as today > - On the umbrella, they add a pointer to a git branch to be used as > the staging area for the branch. As they develop each subtask, they > can use the JIRA to discuss the development like they would with a > normally committed JIRA, but when they feel it is ready to go (not > requiring a +1 from any committer) they commit to their git branch > instead of the SVN repo. > some integration w/ jenkins and pull testing would be good here > - When the branch is ready to merge, they can call a merge vote, which > requires +1 from 3 committers, same as a branch being proposed by an > existing committer. A committer would then use git-svn to merge their > branch commit-by-commit, or if it is less extensive, simply generate a > single big patch to commit into SVN. > > My thinking is that this would provide a low-friction way for people > to collaborate with the community and develop in the open, without > having to work closely with any committer to review every individual > subtask. > > Another alternative, if people are reluctant to use git, would be to > add a "sandbox/" repository inside our SVN, and hand out commit bit to > branches inside there without any PMC vote. Anyone interested in > contributing could request a branch in the sandbox, and be granted > access as soon as they get an apache SVN account. > > I don't see the technical issues with how the merge is done as the main problem. The barriers to getting your stuff in are 1. getting people to care enough to help develop the feature -mentorship, collaborative development. 2. getting incremental parts in to avoid the continual merge-regression-test hell that you go through if you are trying to keep a separate branch alive. It's not the technical aspects of the merge so much as the need to run all the hadoop tests and your own test suite, and track down whether a failure is a regression in -trunk or something in your code. Jun's patch is an example of this situation. We haven't seen the effort he and his colleagues have done with merge and test, but I'm confident it's been there. What they now have is a "big bang" class of patch which is so big that anyone reviewing it would have to spend a couple of weeks going through the codebase trying to understand it. Which as we all know means two weeks not doing all the things you are committed to doing. We know it's there, we know it's current -so how to use this as an exercise in something to pull in incrementally? -Steve