Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
On Apr 8, 2011, at 11:08 AM, Todd Lipcon wrote: These all have patches that are pretty small, and I'd imagine would apply pretty easily to trunk. Let me know if you'd like any help forward-porting. Thanks Todd, I'm happy to help review etc. The other ones, as new features/improvements, I'd agree it makes sense not to waste effort re-implementing them for trunk MR, but rather to make sure they're incorporated in next-gen. Yep, exactly. Glad to know it makes sense. thanks, Arun
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Thanks Todd, your help with the jiras you IDed would be welcome! --- E14 - typing on glass On Apr 8, 2011, at 11:09 AM, "Todd Lipcon" wrote: > On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy wrote: > >> >> On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote: >> >> Is there a list available of which patches you've made this decision >>> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR >>> security in trunk has a serious vulnerability. Do we plan on fixing it, or >>> will the answer be that, if anyone needs security, they must update to "MR >>> Next Gen"? >>> >> >> Apologies if my original message was abstruse - I want to ensure that there >> is no confusion between 'forward-port' and 'merge from yahoo-merge branch'. >> >> Let me try to explain again: there are several forward ports from the >> hadoop-0.20-2xx (branch-0.20-security) which are complete, including >> MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in >> MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges >> from yahoo-merge) will have a complete security implementation. >> > > Ah, OK, I see. That makes sense. > > >> >> My message was intended to highlight some small number of features/bugs >> which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such >> jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418, >> MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others. >> >> >> > Looking briefly at those, it seems that the ones that are clear bugs (with > small fixes) should be put in the current MR implementation: > MAPREDUCE-2411 > MAPREDUCE-2409 > MAPREDUCE-2418 (maybe) > > These all have patches that are pretty small, and I'd imagine would apply > pretty easily to trunk. Let me know if you'd like any help forward-porting. > > The other ones, as new features/improvements, I'd agree it makes sense not > to waste effort re-implementing them for trunk MR, but rather to make sure > they're incorporated in next-gen. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy wrote: > > On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote: > > Is there a list available of which patches you've made this decision >> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR >> security in trunk has a serious vulnerability. Do we plan on fixing it, or >> will the answer be that, if anyone needs security, they must update to "MR >> Next Gen"? >> > > Apologies if my original message was abstruse - I want to ensure that there > is no confusion between 'forward-port' and 'merge from yahoo-merge branch'. > > Let me try to explain again: there are several forward ports from the > hadoop-0.20-2xx (branch-0.20-security) which are complete, including > MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in > MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges > from yahoo-merge) will have a complete security implementation. > Ah, OK, I see. That makes sense. > > My message was intended to highlight some small number of features/bugs > which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such > jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418, > MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others. > > > Looking briefly at those, it seems that the ones that are clear bugs (with small fixes) should be put in the current MR implementation: MAPREDUCE-2411 MAPREDUCE-2409 MAPREDUCE-2418 (maybe) These all have patches that are pretty small, and I'd imagine would apply pretty easily to trunk. Let me know if you'd like any help forward-porting. The other ones, as new features/improvements, I'd agree it makes sense not to waste effort re-implementing them for trunk MR, but rather to make sure they're incorporated in next-gen. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Todd, On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote: Is there a list available of which patches you've made this decision about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR security in trunk has a serious vulnerability. Do we plan on fixing it, or will the answer be that, if anyone needs security, they must update to "MR Next Gen"? Apologies if my original message was abstruse - I want to ensure that there is no confusion between 'forward-port' and 'merge from yahoo- merge branch'. Let me try to explain again: there are several forward ports from the hadoop-0.20-2xx (branch-0.20-security) which are complete, including MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges from yahoo-merge) will have a complete security implementation. My message was intended to highlight some small number of features/ bugs which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418, MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others. Hope that makes sense. Again, apologies for any confusion I've caused. thanks, Arun
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Is there a list available of which patches you've made this decision about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR security in trunk has a serious vulnerability. Do we plan on fixing it, or will the answer be that, if anyone needs security, they must update to "MR Next Gen"? -Todd On Thu, Apr 7, 2011 at 3:52 PM, Arun C Murthy wrote: > > On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote: > >> >> As the final installment in this process, I've started a discussion on >> us contributing a re-factor of Map-Reduce in >> https://issues.apache.org/jira/browse/MAPREDUCE-279 >> . >> > > > > Hi Folks, > > We wanted to share our thoughts around the co-development of the NextGen > MapReduce branch (Jira MR-279), maintaining the branch-0.20-security and > merging the work on the security branch with trunk. We've concluded that it > does not make sense for us to port a very small subset of the work from the > branch-0.20-security to the Hadoop mainline. The JIRAs we don't plan to > port all effect areas of the mainline that are going to be replaced by work > in the NextGen MapReduce branch ( > http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/). > > We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within > Apache for a while now and are excited about it's progress. We think that > this branch will be a huge improvement in scalability, performance and > functionality. We are now confident that we can get it ready for release in > in the next few months. We believe that the next major release of Apache > Hadoop we will test at Yahoo will include the work in this branch and we are > committed to merging the NextGen branch into the mainline after the PMC > approves the merge. > > Meanwhile, we have continued to find and fix bugs on branch-0.20-security > and have been working to port that work into the Hadoop mainline. Most of > this work is done and we've also brought all the patches in from our github > branch into apache subversion, so that it is easy for everyone to see the > work remaining. What we've found is that some of the work in > branch-0.20-security is in code sections that have been completely replaced > / refactored in the NextGen MapReduce branch. Since we are committed to the > NextGen branch, we don't think there is any upside in porting this code into > portions of mainline we expect to discard. All of these JIRAs will be fixed > in the NextGen MapReduce branch and through there ultimately in trunk > (assuming the PMC approves the merge). > > So at this point it is our intent to not port the JIRAs listed above to > trunk, but to wait until we merge NextGen into trunk to resolve these issues > there. If you are interested in seeing these issues ported to mainline, let > us know. We are happy to help review your patches and explain context to > anyone who is interested in doing this work. > > Arun and Eric > -- Todd Lipcon Software Engineer, Cloudera
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote: As the final installment in this process, I've started a discussion on us contributing a re-factor of Map-Reduce in https://issues.apache.org/jira/browse/MAPREDUCE-279 . Hi Folks, We wanted to share our thoughts around the co-development of the NextGen MapReduce branch (Jira MR-279), maintaining the branch-0.20- security and merging the work on the security branch with trunk. We've concluded that it does not make sense for us to port a very small subset of the work from the branch-0.20-security to the Hadoop mainline. The JIRAs we don't plan to port all effect areas of the mainline that are going to be replaced by work in the NextGen MapReduce branch (http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/ ). We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within Apache for a while now and are excited about it's progress. We think that this branch will be a huge improvement in scalability, performance and functionality. We are now confident that we can get it ready for release in in the next few months. We believe that the next major release of Apache Hadoop we will test at Yahoo will include the work in this branch and we are committed to merging the NextGen branch into the mainline after the PMC approves the merge. Meanwhile, we have continued to find and fix bugs on branch-0.20- security and have been working to port that work into the Hadoop mainline. Most of this work is done and we've also brought all the patches in from our github branch into apache subversion, so that it is easy for everyone to see the work remaining. What we've found is that some of the work in branch-0.20-security is in code sections that have been completely replaced / refactored in the NextGen MapReduce branch. Since we are committed to the NextGen branch, we don't think there is any upside in porting this code into portions of mainline we expect to discard. All of these JIRAs will be fixed in the NextGen MapReduce branch and through there ultimately in trunk (assuming the PMC approves the merge). So at this point it is our intent to not port the JIRAs listed above to trunk, but to wait until we merge NextGen into trunk to resolve these issues there. If you are interested in seeing these issues ported to mainline, let us know. We are happy to help review your patches and explain context to anyone who is interested in doing this work. Arun and Eric
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
On Feb 11, 2011, at 2:56 PM, Owen O'Malley wrote: On Jan 31, 2011, at 7:27 PM, Eric Baldeschwieler wrote: Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. I also plan to start pushing the hadoop-future work into a branch called yahoo-merge(?) as individual commits from our internal git repository. The goal of creating the branch is to enable faster review and discussion. These patches will be individually run through the jira, review, and commit process to be added to trunk. As the final installment in this process, I've started a discussion on us contributing a re-factor of Map-Reduce in https://issues.apache.org/jira/browse/MAPREDUCE-279 . We have a prototype we'd like to commit to a branch soon, where we look forward to feedback. From there on, we would love to collaborate to get it committed to trunk. thanks, Arun
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
On Jan 31, 2011, at 7:27 PM, Eric Baldeschwieler wrote: Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. As Eric mentioned, we have several person years worth of development to contribute to Apache. Arun has started that process by creating the branch-0.20-security branch. I plan on pushing the individual patches to branch-0.20-security-patches and then when it is identical with Arun's branch, I'll rename mine to branch-0.20-security. I also plan to start pushing the hadoop-future work into a branch called yahoo-merge(?) as individual commits from our internal git repository. The goal of creating the branch is to enable faster review and discussion. These patches will be individually run through the jira, review, and commit process to be added to trunk. -- Owen
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Yes. We have been and continue to be firm believers in Apache and the value of Open Source software, as you can see from our track record to date of contributing heavily to Hadoop and donating Pig, ZooKeeper, Avro, etc. We are excited about their potential and we hope others will find them useful too. ToddP On 1/31/11 7:44 PM, "Jeff Hammerbacher" mailto:ham...@cloudera.com>> wrote: Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as well? On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler mailto:eri...@yahoo-inc.com>>wrote: Hi Folks, I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. We plan to remove all references to a Yahoo distribution from our website (developer.yahoo.com/hadoop), close our github repo ( yahoo.github.com/hadoop-common) and focus on working more closely with the Apache community. Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bullet proof that Yahoo and other production Hadoop users can run them unpatched on their clusters. Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary Apache Hadoop releases that the entire community used on their clusters.As the community grew, we have experiment with using the "Yahoo! Distribution of Hadoop" as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases. We also want to share our regular program of sustaining engineering that backports minor feature enhancements into new dot releases on a regular basis, so that the world sees regular improvements coming from Apache every few months, not years. Recently the Apache Hadoop community has been very turbulent. Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but the future of the "Yahoo distribution of Hadoop" was far from clear. We've concluded that focusing on Apache Hadoop is the way forward. We believe that more focus on communicating our goals to the Apache Hadoop community, and more willingness to compromise on how we get to those goals, will help us get back to making Hadoop even better. Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that are all backwards compatible with our "Hadoop 0.20 with security". It is our most stable and high performance release of Hadoop ever. We've expended a lot of energy finding and fixing bugs in it this year. We have initiated the process of contributing this work to Apache in the branch: hadoop/common/branches/branch-0.20-security. We've proposed calling this the 20.100 release. Once folks have had a chance to try this out and we've had a chance to respond to their feedback, we plan to create 20.100 release candidates and ask the community to vote on making them Apache releases. Hadoop-future is our new feature branch. We are working on a set of new features for Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable in mission critical deployments. You're going to see another burst of email activity from us as we work to get hadoop-future patches socialized, reviewed and checked in. These bulk checkins are exceptional. They are the result of us striving to be more transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into Apache, folks can expect us to return to our regular development cadence. Looking forward, we plan to socialize our roadmaps regularly, actively synchronize our work with other active Hadoop contributors and develop our code collaboratively, directly in Apache. In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is a commitment to working more effectively with the Apache Hadoop community. Our goal is to make Apache Hadoop THE open source platform for big data. Thanks, E14 -- PS Here is a draft list of key features in hadoop-future: * HDFS-1052 - Federation, the ability to support much more storage per Hadoop cluster. * HADOOP-6728 - A the new metrics framework * MAPREDUCE-1220 - Optimizations for small jobs --- PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
> From: Alan Gates > > We will be proposing Howl as an Incubator project soon. That would be excellent. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
We will be proposing Howl as an Incubator project soon. Alan. On Jan 31, 2011, at 7:44 PM, Jeff Hammerbacher wrote: Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as well? On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler wrote: Hi Folks, I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. We plan to remove all references to a Yahoo distribution from our website (developer.yahoo.com/hadoop), close our github repo ( yahoo.github.com/hadoop-common) and focus on working more closely with the Apache community. Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bullet proof that Yahoo and other production Hadoop users can run them unpatched on their clusters. Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary Apache Hadoop releases that the entire community used on their clusters.As the community grew, we have experiment with using the "Yahoo! Distribution of Hadoop" as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases. We also want to share our regular program of sustaining engineering that backports minor feature enhancements into new dot releases on a regular basis, so that the world sees regular improvements coming from Apache every few months, not years. Recently the Apache Hadoop community has been very turbulent. Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd) , but the future of the "Yahoo distribution of Hadoop" was far from clear. We've concluded that focusing on Apache Hadoop is the way forward. We believe that more focus on communicating our goals to the Apache Hadoop community, and more willingness to compromise on how we get to those goals, will help us get back to making Hadoop even better. Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that are all backwards compatible with our "Hadoop 0.20 with security". It is our most stable and high performance release of Hadoop ever. We've expended a lot of energy finding and fixing bugs in it this year. We have initiated the process of contributing this work to Apache in the branch: hadoop/common/branches/branch-0.20-security. We've proposed calling this the 20.100 release. Once folks have had a chance to try this out and we've had a chance to respond to their feedback, we plan to create 20.100 release candidates and ask the community to vote on making them Apache releases. Hadoop-future is our new feature branch. We are working on a set of new features for Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable in mission critical deployments. You're going to see another burst of email activity from us as we work to get hadoop-future patches socialized, reviewed and checked in. These bulk checkins are exceptional. They are the result of us striving to be more transparent. Once we've merged our hadoop-future and hadoop-0.20- sustaining work back into Apache, folks can expect us to return to our regular development cadence. Looking forward, we plan to socialize our roadmaps regularly, actively synchronize our work with other active Hadoop contributors and develop our code collaboratively, directly in Apache. In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is a commitment to working more effectively with the Apache Hadoop community. Our goal is to make Apache Hadoop THE open source platform for big data. Thanks, E14 -- PS Here is a draft list of key features in hadoop-future: * HDFS-1052 - Federation, the ability to support much more storage per Hadoop cluster. * HADOOP-6728 - A the new metrics framework * MAPREDUCE-1220 - Optimizations for small jobs --- PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Congratulations Eric. this is fantastic news. On Jan 31, 2011, at 10:27 PM, Eric Baldeschwieler wrote: > Hi Folks, > > I'm pleased to announce that after some reflection, Yahoo! has decided to > discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache > Hadoop. We plan to remove all references to a Yahoo distribution from our > website (developer.yahoo.com/hadoop), close our github repo > (yahoo.github.com/hadoop-common) and focus on working more closely with the > Apache community. Our intent is to return to helping Apache produce binary > releases of Apache Hadoop that are so bullet proof that Yahoo and other > production Hadoop users can run them unpatched on their clusters. > > Until Hadoop 0.20, Yahoo committers worked as release masters to produce > binary Apache Hadoop releases that the entire community used on their > clusters.As the community grew, we have experiment with using the "Yahoo! > Distribution of Hadoop" as the vehicle to share our work. Unfortunately, > Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! > team wants to return to a world where anyone can download and directly use > releases of Hadoop from Apache. We want to contribute to the stabilization > and testing of those releases. We also want to share our regular program of > sustaining engineering that backports minor feature enhancements into new dot > releases on a regular basis, so that the world sees regular improvements > coming from Apache every few months, not years. > > Recently the Apache Hadoop community has been very turbulent. Over the last > few months we have been developing Hadoop enhancements in our internal git > repository while doing a complete review of our options. Our commitment to > open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but > the future of the "Yahoo distribution of Hadoop" was far from clear. We've > concluded that focusing on Apache Hadoop is the way forward. We believe that > more focus on communicating our goals to the Apache Hadoop community, and > more willingness to compromise on how we get to those goals, will help us get > back to making Hadoop even better. > > Unfortunately, we now have to sort out how to contribute several person-years > worth of work to Apache to let us unwind the Yahoo! git repositories. We > currently run two lines of Hadoop development, our sustaining program > (hadoop-0.20-sustaining) and hadoop-future. Hadoop-0.20-sustaining is the > stable version of Hadoop we currently run on Yahoo's 40,000 nodes. It > contains a series of fixes and enhancements that are all backwards compatible > with our "Hadoop 0.20 with security". It is our most stable and high > performance release of Hadoop ever. We've expended a lot of energy finding > and fixing bugs in it this year. We have initiated the process of > contributing this work to Apache in the branch: > hadoop/common/branches/branch-0.20-security. We've proposed calling this the > 20.100 release. Once folks have had a chance to try this out and we've had a > chance to respond to their feedback, we plan to create 20.100 release > candidates and ask the community to vote on making them Apache releases. > > Hadoop-future is our new feature branch. We are working on a set of new > features for Hadoop to improve its availability, scalability and > interoperability to make Hadoop more usable in mission critical deployments. > You're going to see another burst of email activity from us as we work to get > hadoop-future patches socialized, reviewed and checked in. These bulk > checkins are exceptional. They are the result of us striving to be more > transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining > work back into Apache, folks can expect us to return to our regular > development cadence. Looking forward, we plan to socialize our roadmaps > regularly, actively synchronize our work with other active Hadoop > contributors and develop our code collaboratively, directly in Apache. > > In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" > is a commitment to working more effectively with the Apache Hadoop community. > Our goal is to make Apache Hadoop THE open source platform for big data. > > Thanks, > > E14 > > -- > > PS Here is a draft list of key features in hadoop-future: > > * HDFS-1052 - Federation, the ability to support much more storage per Hadoop > cluster. > > * HADOOP-6728 - A the new metrics framework > > * MAPREDUCE-1220 - Optimizations for small jobs > > --- > PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as well? On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler wrote: > Hi Folks, > > I'm pleased to announce that after some reflection, Yahoo! has decided to > discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache > Hadoop. We plan to remove all references to a Yahoo distribution from our > website (developer.yahoo.com/hadoop), close our github repo ( > yahoo.github.com/hadoop-common) and focus on working more closely with the > Apache community. Our intent is to return to helping Apache produce binary > releases of Apache Hadoop that are so bullet proof that Yahoo and other > production Hadoop users can run them unpatched on their clusters. > > Until Hadoop 0.20, Yahoo committers worked as release masters to produce > binary Apache Hadoop releases that the entire community used on their > clusters.As the community grew, we have experiment with using the > "Yahoo! Distribution of Hadoop" as the vehicle to share our work. > Unfortunately, Apache is no longer the obvious place to go for Hadoop > releases. The Yahoo! team wants to return to a world where anyone can > download and directly use releases of Hadoop from Apache. We want to > contribute to the stabilization and testing of those releases. We also want > to share our regular program of sustaining engineering that backports minor > feature enhancements into new dot releases on a regular basis, so that the > world sees regular improvements coming from Apache every few months, not > years. > > Recently the Apache Hadoop community has been very turbulent. Over the > last few months we have been developing Hadoop enhancements in our internal > git repository while doing a complete review of our options. Our commitment > to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), > but the future of the "Yahoo distribution of Hadoop" was far from clear. > We've concluded that focusing on Apache Hadoop is the way forward. We > believe that more focus on communicating our goals to the Apache Hadoop > community, and more willingness to compromise on how we get to those goals, > will help us get back to making Hadoop even better. > > Unfortunately, we now have to sort out how to contribute several > person-years worth of work to Apache to let us unwind the Yahoo! git > repositories. We currently run two lines of Hadoop development, our > sustaining program (hadoop-0.20-sustaining) and hadoop-future. > Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on > Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that > are all backwards compatible with our "Hadoop 0.20 with security". It is > our most stable and high performance release of Hadoop ever. We've expended > a lot of energy finding and fixing bugs in it this year. We have initiated > the process of contributing this work to Apache in the branch: > hadoop/common/branches/branch-0.20-security. We've proposed calling this > the 20.100 release. Once folks have had a chance to try this out and we've > had a chance to respond to their feedback, we plan to create 20.100 release > candidates and ask the community to vote on making them Apache releases. > > Hadoop-future is our new feature branch. We are working on a set of new > features for Hadoop to improve its availability, scalability and > interoperability to make Hadoop more usable in mission critical deployments. > You're going to see another burst of email activity from us as we work to > get hadoop-future patches socialized, reviewed and checked in. These bulk > checkins are exceptional. They are the result of us striving to be more > transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining > work back into Apache, folks can expect us to return to our regular > development cadence. Looking forward, we plan to socialize our roadmaps > regularly, actively synchronize our work with other active Hadoop > contributors and develop our code collaboratively, directly in Apache. > > In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" > is a commitment to working more effectively with the Apache Hadoop > community. Our goal is to make Apache Hadoop THE open source platform for > big data. > > Thanks, > > E14 > > -- > > PS Here is a draft list of key features in hadoop-future: > > * HDFS-1052 - Federation, the ability to support much more storage per > Hadoop cluster. > > * HADOOP-6728 - A the new metrics framework > > * MAPREDUCE-1220 - Optimizations for small jobs > > --- > PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W
[ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Hi Folks, I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. We plan to remove all references to a Yahoo distribution from our website (developer.yahoo.com/hadoop), close our github repo (yahoo.github.com/hadoop-common) and focus on working more closely with the Apache community. Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bullet proof that Yahoo and other production Hadoop users can run them unpatched on their clusters. Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary Apache Hadoop releases that the entire community used on their clusters.As the community grew, we have experiment with using the "Yahoo! Distribution of Hadoop" as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases. We also want to share our regular program of sustaining engineering that backports minor feature enhancements into new dot releases on a regular basis, so that the world sees regular improvements coming from Apache every few months, not years. Recently the Apache Hadoop community has been very turbulent. Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but the future of the "Yahoo distribution of Hadoop" was far from clear. We've concluded that focusing on Apache Hadoop is the way forward. We believe that more focus on communicating our goals to the Apache Hadoop community, and more willingness to compromise on how we get to those goals, will help us get back to making Hadoop even better. Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that are all backwards compatible with our "Hadoop 0.20 with security". It is our most stable and high performance release of Hadoop ever. We've expended a lot of energy finding and fixing bugs in it this year. We have initiated the process of contributing this work to Apache in the branch: hadoop/common/branches/branch-0.20-security. We've proposed calling this the 20.100 release. Once folks have had a chance to try this out and we've had a chance to respond to their feedback, we plan to create 20.100 release candidates and ask the community to vote on making them Apache releases. Hadoop-future is our new feature branch. We are working on a set of new features for Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable in mission critical deployments. You're going to see another burst of email activity from us as we work to get hadoop-future patches socialized, reviewed and checked in. These bulk checkins are exceptional. They are the result of us striving to be more transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into Apache, folks can expect us to return to our regular development cadence. Looking forward, we plan to socialize our roadmaps regularly, actively synchronize our work with other active Hadoop contributors and develop our code collaboratively, directly in Apache. In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is a commitment to working more effectively with the Apache Hadoop community. Our goal is to make Apache Hadoop THE open source platform for big data. Thanks, E14 -- PS Here is a draft list of key features in hadoop-future: * HDFS-1052 - Federation, the ability to support much more storage per Hadoop cluster. * HADOOP-6728 - A the new metrics framework * MAPREDUCE-1220 - Optimizations for small jobs --- PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W