Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as well?
On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler <[email protected]>wrote: > Hi Folks, > > I'm pleased to announce that after some reflection, Yahoo! has decided to > discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache > Hadoop. We plan to remove all references to a Yahoo distribution from our > website (developer.yahoo.com/hadoop), close our github repo ( > yahoo.github.com/hadoop-common) and focus on working more closely with the > Apache community. Our intent is to return to helping Apache produce binary > releases of Apache Hadoop that are so bullet proof that Yahoo and other > production Hadoop users can run them unpatched on their clusters. > > Until Hadoop 0.20, Yahoo committers worked as release masters to produce > binary Apache Hadoop releases that the entire community used on their > clusters. As the community grew, we have experiment with using the > "Yahoo! Distribution of Hadoop" as the vehicle to share our work. > Unfortunately, Apache is no longer the obvious place to go for Hadoop > releases. The Yahoo! team wants to return to a world where anyone can > download and directly use releases of Hadoop from Apache. We want to > contribute to the stabilization and testing of those releases. We also want > to share our regular program of sustaining engineering that backports minor > feature enhancements into new dot releases on a regular basis, so that the > world sees regular improvements coming from Apache every few months, not > years. > > Recently the Apache Hadoop community has been very turbulent. Over the > last few months we have been developing Hadoop enhancements in our internal > git repository while doing a complete review of our options. Our commitment > to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), > but the future of the "Yahoo distribution of Hadoop" was far from clear. > We've concluded that focusing on Apache Hadoop is the way forward. We > believe that more focus on communicating our goals to the Apache Hadoop > community, and more willingness to compromise on how we get to those goals, > will help us get back to making Hadoop even better. > > Unfortunately, we now have to sort out how to contribute several > person-years worth of work to Apache to let us unwind the Yahoo! git > repositories. We currently run two lines of Hadoop development, our > sustaining program (hadoop-0.20-sustaining) and hadoop-future. > Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on > Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that > are all backwards compatible with our "Hadoop 0.20 with security". It is > our most stable and high performance release of Hadoop ever. We've expended > a lot of energy finding and fixing bugs in it this year. We have initiated > the process of contributing this work to Apache in the branch: > hadoop/common/branches/branch-0.20-security. We've proposed calling this > the 20.100 release. Once folks have had a chance to try this out and we've > had a chance to respond to their feedback, we plan to create 20.100 release > candidates and ask the community to vote on making them Apache releases. > > Hadoop-future is our new feature branch. We are working on a set of new > features for Hadoop to improve its availability, scalability and > interoperability to make Hadoop more usable in mission critical deployments. > You're going to see another burst of email activity from us as we work to > get hadoop-future patches socialized, reviewed and checked in. These bulk > checkins are exceptional. They are the result of us striving to be more > transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining > work back into Apache, folks can expect us to return to our regular > development cadence. Looking forward, we plan to socialize our roadmaps > regularly, actively synchronize our work with other active Hadoop > contributors and develop our code collaboratively, directly in Apache. > > In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" > is a commitment to working more effectively with the Apache Hadoop > community. Our goal is to make Apache Hadoop THE open source platform for > big data. > > Thanks, > > E14 > > -- > > PS Here is a draft list of key features in hadoop-future: > > * HDFS-1052 - Federation, the ability to support much more storage per > Hadoop cluster. > > * HADOOP-6728 - A the new metrics framework > > * MAPREDUCE-1220 - Optimizations for small jobs > > --- > PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W
