Re: branching Hive and getting to first release

Jeff Hammerbacher Thu, 26 Mar 2009 11:19:49 -0700

Hey,

What's the state of the release process? We'd really, really like to see a
Hive release. As Nigel said on the Pig list, releasing often is good for the
community.


Thanks,
Jeff

On Tue, Mar 10, 2009 at 2:03 PM, Zheng Shao <zsh...@gmail.com> wrote:

> Till now the discussion is mainly on policy.
>
> However, another important thing is what we plan to work on in the next 2-3
> weeks. Let's say we are focusing on fixing bugs (both Facebook users and
> open-source community users are crying for bug fixes, from what I can see).
> After 2-3 weeks when most bugs are fixed and Hive is more stable, we can
> come back to review this and decide the policy. Given the more information
> we have at that time, we might be able to make a better decision on
> policies.
>
> Also, if we are focusing on fixing bugs for the next 2-3 weeks, there is no
> point to make a 0.3 branch right now because every bug fix will go into
> both
> 0.3 and trunk anyway.
>
> Let's fix most of the important bugs first, then make 0.3 branch, then we
> can work on 2 things at the same time: new features/perf improvement that
> goes only to trunk, and other minor bugs that goes to both 0.3 and trunk.
>
> Thoughts?
>
> Zheng
>
> On Tue, Mar 10, 2009 at 1:09 PM, Ashish Thusoo <athu...@facebook.com>
> wrote:
>
> > Agreed.
> >
> > I think we moved to trunk because of lazy serde from what Zheng tells me
> (I
> > was out of office when this happened)...
> >
> > Regarding performance fixes, I would rather categorize performance
> > regressions as blocker bugs and keep performance improvements as
> features.
> > By that measure I think lazy serde was fine as a feature. I think we
> should
> > just have let 0.2 stabilize and deployed lazy serde when we released 0.2
> and
> > cut out a 0.3 branch and moved our systems to 0.3. Keeping the criteria
> for
> > what gets categorized as a blocker tight is quite critical otherwise we
> will
> > always be in danger of a constant feature creep and that would totally
> > defeat the purpose of stabilization. In any case if we had been able to
> > stabilize in a months time say for 0.2, I do not think the users would be
> > too unhappy to get the lazy serde a month late. So from that token I
> would
> > not categorize it to be a blocker as such.
> >
> > One constant problem is that the best stress testing environment that we
> > have for Hive right now is our production work load at FB. So I am not
> sure
> > whether we can have a certificate of stability to a branch if we at FB
> pull
> > in patches and run a version that is different from the release. Though
> of
> > course others are always free to get the patches from the JIRA and apply
> > them as they see fit. I am not sure how to address this. Thoughts?
> >
> > Ashish
> >
> > -----Original Message-----
> > From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
> > Sent: Tuesday, March 10, 2009 11:37 AM
> > To: hive-dev@hadoop.apache.org
> > Subject: RE: branching Hive and getting to first release
> >
> > I am in general agreement - but the problems is the mail below doesn't
> > explain why trunk was deployed.
> >
> > Performance fixes are like critical bugs. We cannot run a production
> > cluster that's hurting for performance on non-performant software. To
> that
> > extent - it was a mistake for us to consider lazyserde to be a 'feature'
> > (which is why we didn't back-port it to 0.2). so is hive-223 for example
> -
> > we just need to have it asap in deployment - and by conventional
> definition
> > - it certainly wasn't a regression that would go into a bug fix branch. I
> > suspect there may be more such jiras.
> >
> > One way of looking at this is that we either branched too early, or we
> need
> > to reconsider what goes into a branch.
> >
> > The other way to look at this is that every cluster administrator
> > (including the one at Facebook - who is just like any user of Hive) -
> needs
> > to have the option to pull in latest patches that are critical to his/her
> > deployment. The success of Hive and the happiness of it's internal
> Facebook
> > users should not and cannot be at odds with each other.
> >
> >
> > -----Original Message-----
> > From: Ashish Thusoo [mailto:athu...@facebook.com]
> > Sent: Tuesday, March 10, 2009 11:08 AM
> > To: hive-dev@hadoop.apache.org
> > Subject: RE: branching Hive and getting to first release
> >
> > I think a big reason for what killed 0.2 was the fact that we decided to
> > deploy trunk into production because of some features that the internal
> > users were asking for, instead of just continuing with the 0.2 branch.
> What
> > I want to stress is that we cannot do that going forward. Once we branch
> out
> > 0.3, we have to let 0.3 soak in production till we have atleast 2 weeks
> of
> > run with no blockers (I did not mean that we will just certify a branch
> to
> > be a relase after 2 weeks - what I meant was that we have at least 2
> weeks
> > of run with no blockers) before we cut out a release from the branch.
> Again
> > I must stress that we have to continue deploying the candidate branch
> into
> > production and we cannot move the production machines to trunk as that
> will
> > completely kill the branch (as happened with 0.2). We have to realy
> isolate
> > blocker bug fixes from features and we have to understand that we cannot
> > role out features overnight (as we have done so far for our users at FB)
> as
> > doing that will make it absolutely hopeless in getting any branch stable.
> >
> > Having said that, we could move to a model where we make a new branch
> (not
> > a release) from trunk once the previous candidate branch is released
> instead
> > of having a train of branches at every 2 weeks. I am fine with that too.
> > What is perhaps more critical is that we have a firm commitment that we
> are
> > not going to deploy new features into production till we stabilize 0.3
> and
> > we should set the expectations accordingly...
> >
> > Ashish
> >
> > -----Original Message-----
> > From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
> > Sent: Tuesday, March 10, 2009 9:52 AM
> > To: hive-dev@hadoop.apache.org
> > Subject: Re: branching Hive and getting to first release
> >
> > +1, sounds like a solid plan.
> >
> > Joydeep Sen Sarma wrote:
> > > I am also a little worried about a lot of releases and managing them.
> > perhaps what's clouding my judgement is that there are a lot of critical
> > bugs yet to be fixed - so I don't see how we can stabilize the first
> release
> > in a couple of weeks - or even a month (which is what killed 0.2 I think
> to
> > some extent).
> > >
> > > I would say that the first release is somewhat special. We are fixing a
> > boatload of issues from a very large push of code (all of it!). In
> > subsequent releases - there wouldn't be as many bugs - and a faster
> release
> > cycle would be feasible.
> > >
> > > So my vote would be to branch now (before predicate push down), get the
> > release stable as fast as possible (but potentially wait as long as it
> > takes) - and then only start cutting more branches. Over time - we can
> > converge to a faster release cycle - but right now this seems dubious to
> me.
> > >
> > > Can't put a newborn into kindergarten directly man .. :-)
> > >
> > > -----Original Message-----
> > > From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
> > > Sent: Tuesday, March 10, 2009 3:43 AM
> > > To: hive-dev@hadoop.apache.org
> > > Subject: Re: branching Hive and getting to first release
> > >
> > > I'm worried that trying to create a new release every other week will
> > > be too often. Isn't there a risk that we're still fixing bugs in 0.3
> > > when the 0.5 branch is cut if we run into something unexpected?
> > > It seems Hadoop is suffering from this issue a bit lately even though
> > > they branch quarterly, 0.19 still have lots of issues open when people
> > > are committing patches to 0.21 (trunk). Granted Hadoop is a much
> > > larger codebase with more patches applied.
> > >
> > > That said, I won't oppose trying the period suggested and see how it
> > > goes, it's quite easy to change after all.
> > >
> > > /Johan
> > >
> > > Ashish Thusoo wrote:
> > >> For 0.2 we had set a feature freeze date on the 28th of Jan and as I
> > >> had mentioned in the previous email, the plan was cut a branch on the
> > last wednesday of every month and then issue a vote for making it a
> release
> > once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @
> facebook.
> > Accordingly I was hoping that we would limit the changes that would go
> into
> > the branch (0.2) in this case to the blocker bugs only but it seems that
> we
> > had some feature creep and as a result we switched to using trunk at
> > facebook without giving sufficient time for 0.2 to stabilize. It also
> means
> > that perhaps waiting for a month for each release is too long at this
> stage
> > at least for FB. If others are in agreement, how about we do the
> following
> > going forward..
> > >>
> > >>
> > >> Cut a branch every other wednesday, only checkin the most ciritcal
> > blocker bugs into the branch and reserve the features for trunk which
> will
> > be picked up in the next branch and relegiously deploy only the versions
> of
> > the branch at FB. We can start off a vote to make a branch an official
> > release once we have atleast 2 weeks of run on the branch without any
> > blocker bugs (i.e. we did not have a need to upgrade the production
> machines
> > at FB).
> > >>
> > >> We can start off by creating a 0.3 branch this wednesday
> accordingly...
> > >>
> > >> Once we have an agreement on this we can document this procedure on
> the
> > wiki and religiously follow it. Without controlling the tendency of a
> > feature creep it would be difficult to get a stable version out...
> > >>
> > >> Thoughts?
> > >>
> > >> Ashish
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
> > >> Sent: Tuesday, March 03, 2009 2:54 AM
> > >> To: hive-dev@hadoop.apache.org
> > >> Subject: Re: branching Hive and getting to first release
> > >>
> > >> To be honest I must've missed that 0.2 was branched (I found the email
> > now though), was there a feature freeze date set?
> > >>
> > >> After branching shouldn't we have moved the non critical issues to 0.3
> > and pushed for fixing the remaining bugs in order to release?
> > >>
> > >> That aside, I don't have a strong opinion whether the next release is
> > >> 0.2 or 0.3, since there hasn't been an Apache release yet. How about
> > setting a feature freeze date now and take it from there?
> > >>
> > >> /Johan
> > >>
> > >> Joydeep Sen Sarma wrote:
> > >>> Hey folks,
> > >>>
> > >>> A few of us were chatting earlier today (some Facebook and Cloudera
> > folks) on best approach to get to a first Hive release.
> > >>>
> > >>> While 0.2 has been branched - it seems awkward to base the first
> > release on it. The reason is twofold:
> > >>>
> > >>> -          new changes to trunk since 0.2 have been relatively
> > contained AFAIK (so no added instability). As evidence - Facebook has
> > reverted to running trunk in production for the last week or so.
> > >>> -          the changes that have gone into trunk since 0.2 are
> > extremely important from performance perspective. This includes the
> > LazySerDe that Zheng added and upcoming hive-232.
> > >>>
> > >>> So one proposal is to branch 0.3 at this point and try to make that
> > first official release for Hive.
> > >>>
> > >>> This does look a little haphazard - and the natural question is
> whether
> > we can stick to this (or we end up repeating this once we throw in some
> more
> > goodies). The feeling is that this may be a good time - hive-279 has
> major
> > changes to the hive compiler and branching 0.3 before those changes are
> > checked in gives us a good chance of producing a stable release with good
> > performance (and the major changes will probably prevent us from
> repeating
> > this trick going forward :)).
> > >>>
> > >>> What do people think?
> > >>>
> > >>> Joydeep
> > >>>
> > >
> >
> >
>
>
> --
> Yours,
> Zheng
>

Re: branching Hive and getting to first release

Reply via email to