Hope to have time to write up some more thoughts later, but some interesting reading is this document from Linux on how to contribute to that project: https://github.com/mirrors/linux-2.6/blob/master/Documentation/SubmittingPatches
Worth looking at other projects' guidelines to form our own if we're thinking of going this route. -Todd On Wed, Sep 5, 2012 at 4:43 PM, Jesse Yates <jesse.k.ya...@gmail.com> wrote: > On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <ecl...@stumbleupon.com>wrote: > >> +1 on git, either on github or closer to the linux model with real >> distributed repos. >> >> - I've been using it for just about all of my development and it works >> pretty nicely. I push everything to github as I'm working. Then I >> squash commits and create a diff to post on jira. >> > > I do the same, just locally. Solid model. > > >> - I would suggest that since hbase's code base moves so rapidly, a >> rebased branch should probably be a requirement before merging. >> Otherwise the merge will get pretty interesting for very long lived >> branches. >> > > IIRC when Todd was working on some large stuff for HDFS he was doing this > in a feature branch every few days. Seriously helps with when things are > actually finished in terms of rolling it back in. > > Using github to keep a constantly rebased version (every few days) would be > a reasonble, super-low friction way of solving the problem for > non-committers. Further, for big changes, it would ensure that if the > people go away we aren't left with a bunch of dangling branches in the svn. > Problem here is also establishing the 'master' branch in github, though > that can be established on a case-by-case basis with the people involved. > >> >> On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <j...@cloudera.com> wrote: >> > This has been brought up in the past but we are here again. >> > >> > We have a few large features that are hanging out and having a hard time >> > because trunk changes underneath it and in some cases because they are >> > being worked by folks without a commit bit. (ex: snapshots w/ Jesse and >> > Matteo, and have some other potentially in the pipeline -- major >> assignment >> > > I'm generally opposed to doing feature branches for a variety of reasons > (left behind functionality, hard to roll back in, difficulty of testing, > etc) and further don't really feel its really necessary for the snapshot > code given that the code doesn't touch all that much of the current > codebase. > > A lot of the pain with it right now is that the code has been broken into 5 > patches, making it hard to build a version of HBase that has snapshots 'in > its current form'. This gets even worse as I'm planning on doing a bit more > refactoring into a couple more patches to help make it more digestable > (e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which > pulls out a lot of the coordination functionality)). This helps with > reviews, etc, but makes it a bit of a pain for people who want to do > advanced testing on the feature - hard to justify doing a lot of that work > though as if the code is changing a lot, then testing doesn't make much > sense. > > In terms of how the work is breaking down, with Matteo doing restore on top > of the taking that I'm working on, his part clearly depends on the taking > of snapshots. However, the filesystem layout hasn't changed at all in > nearly the last two months, meaning the work can proceed pretty much > independently (more or less). > > >> > manager changes with Jimmy and possibly me, >> > > This is a lot more high-touch with the codebase, making a branch (either in > sandbox or otherwise) more feasible. > > >> HBASE-4120, HBASE-2600, >> > removing root) >> > > Salesforce is planning on tackling at least the latter two in the next few > months, so this is something that we need to figure out :) > > >> > >> > Though I wasn't around yet, it seems like this is what we did for >> > coprocs/security, probably for the 0.90 master. >> > >> http://search-hadoop.com/m/byzZYZMktx1/hbase+windows&subj=Re+Proposed+feature+branch+for+HBase+security >> > >> > Where the folks working on those features committers at the time? What >> do >> > we do for contributions from folks who aren't committers yet? >> > >> > This was proposed over on hadoop-general by Todd -- what do you all think >> > about doing something like this for the major changes? (Github seems >> > easiest, svn seems "more official"). >> > >> > Here's one proposal, making use of git as an easy way to allow >> > non-committers to "commit" code while still tracking development in >> > the usual places: >> > - Upon anyone's request, we create a new "Version" tag in JIRA. >> > - The developers create an umbrella JIRA for the project, and file the >> > individual work items as subtasks (either up front, or as they are >> > developed if using a more iterative model) >> > - On the umbrella, they add a pointer to a git branch to be used as >> > the staging area for the branch. As they develop each subtask, they >> > can use the JIRA to discuss the development like they would with a >> > normally committed JIRA, but when they feel it is ready to go (not >> > requiring a +1 from any committer) they commit to their git branch >> > instead of the SVN repo. >> > - When the branch is ready to merge, they can call a merge vote, which >> > requires +1 from 3 committers, same as a branch being proposed by an >> > existing committer. A committer would then use git-svn to merge their >> > branch commit-by-commit, or if it is less extensive, simply generate a >> > single big patch to commit into SVN. >> > > Overall, this seems reasonable. I can imagine the work to merge back in > being a huge pain. It would be great to see if we can break down these big > changes into smaller patches and roll them in one at a time. Both in terms > of ease on a single committer as helping to ensure code quality of each > sub-piece; its easier to enforce good testing on smaller pieces and helps > with code reuse. > > My comments above obviously contradict this a little bit - its a huge pain > to work on the end functionality when the sub-pieces that you are building > on shift due to code reviews. In the end it leads to a better foundation, > but can be headache to keep everything in sync. > > The latter goes away a bit if we have a single branch with the majority of > the code then progressive commits to fix things, but still is terrible to > review (pot calling the kettle black here) that first massive code drop. > > TL;DR prefer smaller, independently useful patches that build to the bigger > change. Its may not be possible for some features, but should make it > easier to review, roll in, and in the end merge the final change while > being more generally useful. > > >> >> > Another alternative, if people are reluctant to use git, would be to >> > add a "sandbox/" repository inside our SVN, and hand out commit bit to >> > branches inside there without any PMC vote. Anyone interested in >> > contributing could request a branch in the sandbox, and be granted >> > access as soon as they get an apache SVN account. >> > >> > > This seems a little excessive. It would be nice for the more 'official' > status this confers, but seems to create more friction than its worth > (IMO). > > > TL;DR github with 'official' branches per umbrella JIRA seems a > low-friction way to do feature branches without the possiblitly of cruft in > the main repository. We should really be sure that we need a branch though > and still favoring smaller patches along the same branch for generally > useful features. > > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com -- Todd Lipcon Software Engineer, Cloudera