Thanks for starting this thread, Steve. I think your points below are good. I've snipped most of your comment and will reply inline to one bit below:
On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran <steve.lough...@gmail.com> wrote: > Of the big changes that have worked, they are > > > 1. HDFS 2's HA and ongoing improvements: collaborative dev on the list > with incremental changes going on in trunk, RTC with lots of tests. This > isn't finished, and the test problem there is that functional testing of > all failure modes requires software-controlled fencing devices and switches > -and tests to generated the expected failure space. Actually, most of the HDFS HA code has been done on branches. The first work that led towards HA was the redesign of the edits logging infrastrucutre -- HDFS-1073. This was a feature branch with about 60 patches on it. Then HDFS-1623, the main manual-failover HA development, had close to 150 patches on the branch. Automatic HA (HDFS-3042) was some 15-20 patches. The current work (removing dependency on NAS) is around 35 patches in so far and getting close to merge. In these various branches, we've experimented with a few policies which have differed from trunk. In particular: - HDFS-1073 had a "modified review then commit" policy, which was that, if a patch sat without a review for more than 24hrs, we committed it with the restriction that there would be a post-commit review before the branch was merged. - All of the branches have done away with the requirement of running the full QA suite, findbugs, etc prior to commit. This means that the branches at times have broken tests checked in, but also makes it quicker to iterate on the new feature. Again, the assumption is that these requirements are met before merge. - In all cases there has been a design doc and some good design discussion up front before substantial code was written. This made it easier to forge ahead on the branch with good confidence that the community was on-board with the idea. Given my experiences, I think all of the above are useful to follow. It means development can happen quickly, but ensures that when the merge is proposed, people feel like the quality meets our normal standards. > 2. YARN: Arun on his own branch, CTR, merge once mostly stable, and > completely replacing MRv1. I'd actually contend that YARN was merged too early. I have yet to see anyone running YARN in production, and it's holding up the "Stable" moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and I'm seeing fewer issues in our customers running Hadoop HDFS 2 compared to Hadoop 1-derived code. > > How then do we get (a) more dev projects working and integrated by the > current committers, and (b) a process in which people who are not yet > contributors/committers can develop non-trivial changes to the project in a > way that it is done with the knowledge, support and mentorship of the rest > of the community? Here's one proposal, making use of git as an easy way to allow non-committers to "commit" code while still tracking development in the usual places: - Upon anyone's request, we create a new "Version" tag in JIRA. - The developers create an umbrella JIRA for the project, and file the individual work items as subtasks (either up front, or as they are developed if using a more iterative model) - On the umbrella, they add a pointer to a git branch to be used as the staging area for the branch. As they develop each subtask, they can use the JIRA to discuss the development like they would with a normally committed JIRA, but when they feel it is ready to go (not requiring a +1 from any committer) they commit to their git branch instead of the SVN repo. - When the branch is ready to merge, they can call a merge vote, which requires +1 from 3 committers, same as a branch being proposed by an existing committer. A committer would then use git-svn to merge their branch commit-by-commit, or if it is less extensive, simply generate a single big patch to commit into SVN. My thinking is that this would provide a low-friction way for people to collaborate with the community and develop in the open, without having to work closely with any committer to review every individual subtask. Another alternative, if people are reluctant to use git, would be to add a "sandbox/" repository inside our SVN, and hand out commit bit to branches inside there without any PMC vote. Anyone interested in contributing could request a branch in the sandbox, and be granted access as soon as they get an apache SVN account. -Todd -- Todd Lipcon Software Engineer, Cloudera