On Wed, Sep 5, 2012 at 4:43 PM, Jesse Yates <[email protected]> wrote:
> On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <[email protected] > >wrote: > > > - I would suggest that since hbase's code base moves so rapidly, a > > rebased branch should probably be a requirement before merging. > > Otherwise the merge will get pretty interesting for very long lived > > branches. > > > > IIRC when Todd was working on some large stuff for HDFS he was doing this > in a feature branch every few days. Seriously helps with when things are > actually finished in terms of rolling it back in. > > Using github to keep a constantly rebased version (every few days) would be > a reasonble, super-low friction way of solving the problem for > non-committers. Further, for big changes, it would ensure that if the > people go away we aren't left with a bunch of dangling branches in the svn. > Problem here is also establishing the 'master' branch in github, though > that can be established on a case-by-case basis with the people involved. > > Have a work branch and then have the "committed" branch in github? > > > > On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <[email protected]> > wrote: > > > This has been brought up in the past but we are here again. > > > > > > We have a few large features that are hanging out and having a hard > time > > > because trunk changes underneath it and in some cases because they are > > > being worked by folks without a commit bit. (ex: snapshots w/ Jesse > and > > > Matteo, and have some other potentially in the pipeline -- major > > assignment > > > > I'm generally opposed to doing feature branches for a variety of reasons > (left behind functionality, hard to roll back in, difficulty of testing, > etc) and further don't really feel its really necessary for the snapshot > code given that the code doesn't touch all that much of the current > codebase. > > I realize it is hard to split all the code up but it is also hard and time consuming to review the design and implementation with a single 300k patch that has many modes. I'll post some other more concrete suggestions there. > A lot of the pain with it right now is that the code has been broken into 5 > patches, making it hard to build a version of HBase that has snapshots 'in > its current form'. This gets even worse as I'm planning on doing a bit more > refactoring into a couple more patches to help make it more digestable > (e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which > pulls out a lot of the coordination functionality)). This helps with > reviews, etc, but makes it a bit of a pain for people who want to do > advanced testing on the feature - hard to justify doing a lot of that work > though as if the code is changing a lot, then testing doesn't make much > sense. > > My hope with splitting parts of it was that we could peal off and commit portions we are agree upon and have something solid and committed to build the rest on top of. If also keeps the subsystems decoupled which is a good thing; this limits the scope of what needs to be understood, so that reviewing/modifying it only requires localized knowledge instead of global knowledge. This process would likely add tests at the internals layers and make these subsystems testable. >From a reviewer's point of view -- I'm going to ignore the things built on top until we get the building blocks to a place where it is easy to follow. > In terms of how the work is breaking down, with Matteo doing restore on top > of the taking that I'm working on, his part clearly depends on the taking > of snapshots. However, the filesystem layout hasn't changed at all in > nearly the last two months, meaning the work can proceed pretty much > independently (more or less). > > This is one case where the design docs did a really good job of defining the structures ahead of time. The spec is laid out and we just need to confirm that the code puts things in those places. :) > > > manager changes with Jimmy and possibly me, > > > > This is a lot more high-touch with the codebase, making a branch (either in > sandbox or otherwise) more feasible. > > > > HBASE-4120, HBASE-2600, > > > removing root) > > > > Salesforce is planning on tackling at least the latter two in the next few > months, so this is something that we need to figure out :) > > Hooray! > > ... > Overall, this seems reasonable. I can imagine the work to merge back in > being a huge pain. It would be great to see if we can break down these big > changes into smaller patches and roll them in one at a time. Both in terms > of ease on a single committer as helping to ensure code quality of each > sub-piece; its easier to enforce good testing on smaller pieces and helps > with code reuse. > > My comments above obviously contradict this a little bit - its a huge pain > to work on the end functionality when the sub-pieces that you are building > on shift due to code reviews. In the end it leads to a better foundation, > but can be headache to keep everything in sync. > > The latter goes away a bit if we have a single branch with the majority of > the code then progressive commits to fix things, but still is terrible to > review (pot calling the kettle black here) that first massive code drop. > > There is another philosophy out there -- commit it but have the feature disabled by default until it has been stabilized. We've tried this in a few places (wal compress, off heap cache, instant schema change) and these features unfortunately haven't gone through the "stabilitzation". Maybe after Lars's push for more frequent releases proves it self, we could go into that camp. I personally don't like the idea of features that are in but doesn't always work or doesn't work in conjunction with other perfectly reasonable seemingly orthogonal features. > TL;DR prefer smaller, independently useful patches that build to the bigger > change. Its may not be possible for some features, but should make it > easier to review, roll in, and in the end merge the final change while > being more generally useful. > > Architectural changes -- like reversing meta and removing root will likely be things we can't really do incrementally. These should be rare. -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
