Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has decided to 
discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. 
 We plan to remove all references to a Yahoo distribution from our website 
(developer.yahoo.com/hadoop), close our github repo 
(yahoo.github.com/hadoop-common) and focus on working more closely with the 
Apache community.  Our intent is to return to helping Apache produce binary 
releases of Apache Hadoop that are so bullet proof that Yahoo and other 
production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary 
Apache Hadoop releases that the entire community used on their clusters.    As 
the community grew, we have experiment with using the "Yahoo! Distribution of 
Hadoop" as the vehicle to share our work.  Unfortunately, Apache is no longer 
the obvious place to go for Hadoop releases.  The Yahoo! team wants to return 
to a world where anyone can download and directly use releases of Hadoop from 
Apache.  We want to contribute to the stabilization and testing of those 
releases.  We also want to share our regular program of sustaining engineering 
that backports minor feature enhancements into new dot releases on a regular 
basis, so that the world sees regular improvements coming from Apache every few 
months, not years.

Recently the Apache Hadoop community has been very turbulent.  Over the last 
few months we have been developing Hadoop enhancements in our internal git 
repository while doing a complete review of our options. Our commitment to open 
sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but the 
future of the "Yahoo distribution of Hadoop" was far from clear.  We've 
concluded that focusing on Apache Hadoop is the way forward.  We believe that 
more focus on communicating our goals to the Apache Hadoop community, and more 
willingness to compromise on how we get to those goals, will help us get back 
to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several person-years 
worth of work to Apache to let us unwind the Yahoo! git repositories.  We 
currently run two lines of Hadoop development, our sustaining program 
(hadoop-0.20-sustaining) and hadoop-future.  Hadoop-0.20-sustaining is the 
stable version of Hadoop we currently run on Yahoo's 40,000 nodes.  It contains 
a series of fixes and enhancements that are all backwards compatible with our 
"Hadoop 0.20 with security".  It is our most stable and high performance 
release of Hadoop ever.  We've expended a lot of energy finding and fixing bugs 
in it this year. We have initiated the process of contributing this work to 
Apache in the branch: hadoop/common/branches/branch-0.20-security.  We've 
proposed calling this the 20.100 release.  Once folks have had a chance to try 
this out and we've had a chance to respond to their feedback, we plan to create 
20.100 release candidates and ask the community to vote on making them Apache 
releases. 

Hadoop-future is our new feature branch.  We are working on a set of new 
features for Hadoop to improve its availability, scalability and 
interoperability to make Hadoop more usable in mission critical deployments. 
You're going to see another burst of email activity from us as we work to get 
hadoop-future patches socialized, reviewed and checked in.  These bulk checkins 
are exceptional.  They are the result of us striving to be more transparent.  
Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into 
Apache, folks can expect us to return to our regular development cadence.  
Looking forward, we plan to socialize our roadmaps regularly, actively 
synchronize our work with other active Hadoop contributors and develop our code 
collaboratively, directly in Apache.

In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is 
a commitment to working more effectively with the Apache Hadoop community.  Our 
goal is to make Apache Hadoop THE open source platform for big data.

Thanks,

E14

--

PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage per Hadoop 
cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

---
PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W

Reply via email to