Hey, What's the state of the release process? We'd really, really like to see a Hive release. As Nigel said on the Pig list, releasing often is good for the community.
Thanks, Jeff On Tue, Mar 10, 2009 at 2:03 PM, Zheng Shao <zsh...@gmail.com> wrote: > Till now the discussion is mainly on policy. > > However, another important thing is what we plan to work on in the next 2-3 > weeks. Let's say we are focusing on fixing bugs (both Facebook users and > open-source community users are crying for bug fixes, from what I can see). > After 2-3 weeks when most bugs are fixed and Hive is more stable, we can > come back to review this and decide the policy. Given the more information > we have at that time, we might be able to make a better decision on > policies. > > Also, if we are focusing on fixing bugs for the next 2-3 weeks, there is no > point to make a 0.3 branch right now because every bug fix will go into > both > 0.3 and trunk anyway. > > Let's fix most of the important bugs first, then make 0.3 branch, then we > can work on 2 things at the same time: new features/perf improvement that > goes only to trunk, and other minor bugs that goes to both 0.3 and trunk. > > Thoughts? > > Zheng > > On Tue, Mar 10, 2009 at 1:09 PM, Ashish Thusoo <athu...@facebook.com> > wrote: > > > Agreed. > > > > I think we moved to trunk because of lazy serde from what Zheng tells me > (I > > was out of office when this happened)... > > > > Regarding performance fixes, I would rather categorize performance > > regressions as blocker bugs and keep performance improvements as > features. > > By that measure I think lazy serde was fine as a feature. I think we > should > > just have let 0.2 stabilize and deployed lazy serde when we released 0.2 > and > > cut out a 0.3 branch and moved our systems to 0.3. Keeping the criteria > for > > what gets categorized as a blocker tight is quite critical otherwise we > will > > always be in danger of a constant feature creep and that would totally > > defeat the purpose of stabilization. In any case if we had been able to > > stabilize in a months time say for 0.2, I do not think the users would be > > too unhappy to get the lazy serde a month late. So from that token I > would > > not categorize it to be a blocker as such. > > > > One constant problem is that the best stress testing environment that we > > have for Hive right now is our production work load at FB. So I am not > sure > > whether we can have a certificate of stability to a branch if we at FB > pull > > in patches and run a version that is different from the release. Though > of > > course others are always free to get the patches from the JIRA and apply > > them as they see fit. I am not sure how to address this. Thoughts? > > > > Ashish > > > > -----Original Message----- > > From: Joydeep Sen Sarma [mailto:jssa...@facebook.com] > > Sent: Tuesday, March 10, 2009 11:37 AM > > To: hive-dev@hadoop.apache.org > > Subject: RE: branching Hive and getting to first release > > > > I am in general agreement - but the problems is the mail below doesn't > > explain why trunk was deployed. > > > > Performance fixes are like critical bugs. We cannot run a production > > cluster that's hurting for performance on non-performant software. To > that > > extent - it was a mistake for us to consider lazyserde to be a 'feature' > > (which is why we didn't back-port it to 0.2). so is hive-223 for example > - > > we just need to have it asap in deployment - and by conventional > definition > > - it certainly wasn't a regression that would go into a bug fix branch. I > > suspect there may be more such jiras. > > > > One way of looking at this is that we either branched too early, or we > need > > to reconsider what goes into a branch. > > > > The other way to look at this is that every cluster administrator > > (including the one at Facebook - who is just like any user of Hive) - > needs > > to have the option to pull in latest patches that are critical to his/her > > deployment. The success of Hive and the happiness of it's internal > Facebook > > users should not and cannot be at odds with each other. > > > > > > -----Original Message----- > > From: Ashish Thusoo [mailto:athu...@facebook.com] > > Sent: Tuesday, March 10, 2009 11:08 AM > > To: hive-dev@hadoop.apache.org > > Subject: RE: branching Hive and getting to first release > > > > I think a big reason for what killed 0.2 was the fact that we decided to > > deploy trunk into production because of some features that the internal > > users were asking for, instead of just continuing with the 0.2 branch. > What > > I want to stress is that we cannot do that going forward. Once we branch > out > > 0.3, we have to let 0.3 soak in production till we have atleast 2 weeks > of > > run with no blockers (I did not mean that we will just certify a branch > to > > be a relase after 2 weeks - what I meant was that we have at least 2 > weeks > > of run with no blockers) before we cut out a release from the branch. > Again > > I must stress that we have to continue deploying the candidate branch > into > > production and we cannot move the production machines to trunk as that > will > > completely kill the branch (as happened with 0.2). We have to realy > isolate > > blocker bug fixes from features and we have to understand that we cannot > > role out features overnight (as we have done so far for our users at FB) > as > > doing that will make it absolutely hopeless in getting any branch stable. > > > > Having said that, we could move to a model where we make a new branch > (not > > a release) from trunk once the previous candidate branch is released > instead > > of having a train of branches at every 2 weeks. I am fine with that too. > > What is perhaps more critical is that we have a firm commitment that we > are > > not going to deploy new features into production till we stabilize 0.3 > and > > we should set the expectations accordingly... > > > > Ashish > > > > -----Original Message----- > > From: Johan Oskarsson [mailto:jo...@oskarsson.nu] > > Sent: Tuesday, March 10, 2009 9:52 AM > > To: hive-dev@hadoop.apache.org > > Subject: Re: branching Hive and getting to first release > > > > +1, sounds like a solid plan. > > > > Joydeep Sen Sarma wrote: > > > I am also a little worried about a lot of releases and managing them. > > perhaps what's clouding my judgement is that there are a lot of critical > > bugs yet to be fixed - so I don't see how we can stabilize the first > release > > in a couple of weeks - or even a month (which is what killed 0.2 I think > to > > some extent). > > > > > > I would say that the first release is somewhat special. We are fixing a > > boatload of issues from a very large push of code (all of it!). In > > subsequent releases - there wouldn't be as many bugs - and a faster > release > > cycle would be feasible. > > > > > > So my vote would be to branch now (before predicate push down), get the > > release stable as fast as possible (but potentially wait as long as it > > takes) - and then only start cutting more branches. Over time - we can > > converge to a faster release cycle - but right now this seems dubious to > me. > > > > > > Can't put a newborn into kindergarten directly man .. :-) > > > > > > -----Original Message----- > > > From: Johan Oskarsson [mailto:jo...@oskarsson.nu] > > > Sent: Tuesday, March 10, 2009 3:43 AM > > > To: hive-dev@hadoop.apache.org > > > Subject: Re: branching Hive and getting to first release > > > > > > I'm worried that trying to create a new release every other week will > > > be too often. Isn't there a risk that we're still fixing bugs in 0.3 > > > when the 0.5 branch is cut if we run into something unexpected? > > > It seems Hadoop is suffering from this issue a bit lately even though > > > they branch quarterly, 0.19 still have lots of issues open when people > > > are committing patches to 0.21 (trunk). Granted Hadoop is a much > > > larger codebase with more patches applied. > > > > > > That said, I won't oppose trying the period suggested and see how it > > > goes, it's quite easy to change after all. > > > > > > /Johan > > > > > > Ashish Thusoo wrote: > > >> For 0.2 we had set a feature freeze date on the 28th of Jan and as I > > >> had mentioned in the previous email, the plan was cut a branch on the > > last wednesday of every month and then issue a vote for making it a > release > > once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ > facebook. > > Accordingly I was hoping that we would limit the changes that would go > into > > the branch (0.2) in this case to the blocker bugs only but it seems that > we > > had some feature creep and as a result we switched to using trunk at > > facebook without giving sufficient time for 0.2 to stabilize. It also > means > > that perhaps waiting for a month for each release is too long at this > stage > > at least for FB. If others are in agreement, how about we do the > following > > going forward.. > > >> > > >> > > >> Cut a branch every other wednesday, only checkin the most ciritcal > > blocker bugs into the branch and reserve the features for trunk which > will > > be picked up in the next branch and relegiously deploy only the versions > of > > the branch at FB. We can start off a vote to make a branch an official > > release once we have atleast 2 weeks of run on the branch without any > > blocker bugs (i.e. we did not have a need to upgrade the production > machines > > at FB). > > >> > > >> We can start off by creating a 0.3 branch this wednesday > accordingly... > > >> > > >> Once we have an agreement on this we can document this procedure on > the > > wiki and religiously follow it. Without controlling the tendency of a > > feature creep it would be difficult to get a stable version out... > > >> > > >> Thoughts? > > >> > > >> Ashish > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: Johan Oskarsson [mailto:jo...@oskarsson.nu] > > >> Sent: Tuesday, March 03, 2009 2:54 AM > > >> To: hive-dev@hadoop.apache.org > > >> Subject: Re: branching Hive and getting to first release > > >> > > >> To be honest I must've missed that 0.2 was branched (I found the email > > now though), was there a feature freeze date set? > > >> > > >> After branching shouldn't we have moved the non critical issues to 0.3 > > and pushed for fixing the remaining bugs in order to release? > > >> > > >> That aside, I don't have a strong opinion whether the next release is > > >> 0.2 or 0.3, since there hasn't been an Apache release yet. How about > > setting a feature freeze date now and take it from there? > > >> > > >> /Johan > > >> > > >> Joydeep Sen Sarma wrote: > > >>> Hey folks, > > >>> > > >>> A few of us were chatting earlier today (some Facebook and Cloudera > > folks) on best approach to get to a first Hive release. > > >>> > > >>> While 0.2 has been branched - it seems awkward to base the first > > release on it. The reason is twofold: > > >>> > > >>> - new changes to trunk since 0.2 have been relatively > > contained AFAIK (so no added instability). As evidence - Facebook has > > reverted to running trunk in production for the last week or so. > > >>> - the changes that have gone into trunk since 0.2 are > > extremely important from performance perspective. This includes the > > LazySerDe that Zheng added and upcoming hive-232. > > >>> > > >>> So one proposal is to branch 0.3 at this point and try to make that > > first official release for Hive. > > >>> > > >>> This does look a little haphazard - and the natural question is > whether > > we can stick to this (or we end up repeating this once we throw in some > more > > goodies). The feeling is that this may be a good time - hive-279 has > major > > changes to the hive compiler and branching 0.3 before those changes are > > checked in gives us a good chance of producing a stable release with good > > performance (and the major changes will probably prevent us from > repeating > > this trick going forward :)). > > >>> > > >>> What do people think? > > >>> > > >>> Joydeep > > >>> > > > > > > > > > > -- > Yours, > Zheng >