Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-08 Thread Arun C Murthy


On Apr 8, 2011, at 11:08 AM, Todd Lipcon wrote:

These all have patches that are pretty small, and I'd imagine would  
apply pretty easily to trunk. Let me know if you'd like any help  
forward-porting.




Thanks Todd, I'm happy to help review etc.

The other ones, as new features/improvements, I'd agree it makes  
sense not to waste effort re-implementing them for trunk MR, but  
rather to make sure they're incorporated in next-gen.


Yep, exactly. Glad to know it makes sense.

thanks,
Arun


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-08 Thread Eric Baldeschwieler
Thanks Todd, your help with the jiras you IDed would be welcome!

---
E14 - typing on glass

On Apr 8, 2011, at 11:09 AM, "Todd Lipcon"  wrote:

> On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy  wrote:
> 
>> 
>> On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:
>> 
>> Is there a list available of which patches you've made this decision
>>> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
>>> security in trunk has a serious vulnerability. Do we plan on fixing it, or
>>> will the answer be that, if anyone needs security, they must update to "MR
>>> Next Gen"?
>>> 
>> 
>> Apologies if my original message was abstruse - I want to ensure that there
>> is no confusion between 'forward-port' and 'merge from yahoo-merge branch'.
>> 
>> Let me try to explain again: there are several forward ports from the
>> hadoop-0.20-2xx (branch-0.20-security) which are complete, including
>> MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in
>> MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges
>> from yahoo-merge) will have a complete security implementation.
>> 
> 
> Ah, OK, I see. That makes sense.
> 
> 
>> 
>> My message was intended to highlight some small number of features/bugs
>> which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such
>> jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418,
>> MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others.
>> 
>> 
>> 
> Looking briefly at those, it seems that the ones that are clear bugs (with
> small fixes) should be put in the current MR implementation:
> MAPREDUCE-2411
> MAPREDUCE-2409
> MAPREDUCE-2418 (maybe)
> 
> These all have patches that are pretty small, and I'd imagine would apply
> pretty easily to trunk. Let me know if you'd like any help forward-porting.
> 
> The other ones, as new features/improvements, I'd agree it makes sense not
> to waste effort re-implementing them for trunk MR, but rather to make sure
> they're incorporated in next-gen.
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-08 Thread Todd Lipcon
On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy  wrote:

>
> On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:
>
>  Is there a list available of which patches you've made this decision
>> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
>> security in trunk has a serious vulnerability. Do we plan on fixing it, or
>> will the answer be that, if anyone needs security, they must update to "MR
>> Next Gen"?
>>
>
> Apologies if my original message was abstruse - I want to ensure that there
> is no confusion between 'forward-port' and 'merge from yahoo-merge branch'.
>
> Let me try to explain again: there are several forward ports from the
> hadoop-0.20-2xx (branch-0.20-security) which are complete, including
> MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in
> MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges
> from yahoo-merge) will have a complete security implementation.
>

Ah, OK, I see. That makes sense.


>
> My message was intended to highlight some small number of features/bugs
> which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such
> jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418,
> MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others.
>
>
>
Looking briefly at those, it seems that the ones that are clear bugs (with
small fixes) should be put in the current MR implementation:
MAPREDUCE-2411
MAPREDUCE-2409
MAPREDUCE-2418 (maybe)

These all have patches that are pretty small, and I'd imagine would apply
pretty easily to trunk. Let me know if you'd like any help forward-porting.

The other ones, as new features/improvements, I'd agree it makes sense not
to waste effort re-implementing them for trunk MR, but rather to make sure
they're incorporated in next-gen.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-08 Thread Arun C Murthy

Todd,

On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:

Is there a list available of which patches you've made this decision  
about? I'm curious, for example, about MAPREDUCE-2178 -- as of  
today, the MR security in trunk has a serious vulnerability. Do we  
plan on fixing it, or will the answer be that, if anyone needs  
security, they must update to "MR Next Gen"?


Apologies if my original message was abstruse - I want to ensure that  
there is no confusion between 'forward-port' and 'merge from yahoo- 
merge branch'.


Let me try to explain again: there are several forward ports from the  
hadoop-0.20-2xx (branch-0.20-security) which are complete, including  
MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in  
MapReduce. These are awaiting a merge into trunk. Trunk (with a few  
merges from yahoo-merge) will have a complete security implementation.


My message was intended to highlight some small number of features/ 
bugs which are/will-be in hadoop-0.20.2xx. Here is a nearly complete  
list of such jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291,  
MAPREDUCE-2418, MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure  
there aren't others.	


Hope that makes sense. Again, apologies for any confusion I've caused.

thanks,
Arun



Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-07 Thread Todd Lipcon
Is there a list available of which patches you've made this decision about?
I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
security in trunk has a serious vulnerability. Do we plan on fixing it, or
will the answer be that, if anyone needs security, they must update to "MR
Next Gen"?

-Todd

On Thu, Apr 7, 2011 at 3:52 PM, Arun C Murthy  wrote:

>
> On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote:
>
>>
>> As the final installment in this process, I've started a discussion on
>> us contributing a re-factor of Map-Reduce in
>> https://issues.apache.org/jira/browse/MAPREDUCE-279
>> .
>>
>
>
>
> Hi Folks,
>
> We wanted to share our thoughts around the co-development of the NextGen
> MapReduce branch (Jira MR-279), maintaining the branch-0.20-security and
> merging the work on the security branch with trunk.  We've concluded that it
> does not make sense for us to port a very small subset of the work from the
> branch-0.20-security to the Hadoop mainline.  The JIRAs we don't plan to
> port all effect areas of the mainline that are going to be replaced by work
> in the NextGen MapReduce branch (
> http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/).
>
> We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within
> Apache for a while now and are excited about it's progress.  We think that
> this branch will be a huge improvement in scalability, performance and
> functionality.  We are now confident that we can get it ready for release in
> in the next few months.  We believe that the next major release of Apache
> Hadoop we will test at Yahoo will include the work in this branch and we are
> committed to merging the NextGen branch into the mainline after the PMC
> approves the merge.
>
> Meanwhile, we have continued to find and fix bugs on branch-0.20-security
> and have been working to port that work into the Hadoop mainline.  Most of
> this work is done and we've also brought all the patches in from our github
> branch into apache subversion, so that it is easy for everyone to see the
> work remaining.  What we've found is that some of the work in
> branch-0.20-security is in code sections that have been completely replaced
> / refactored in the NextGen MapReduce branch.  Since we are committed to the
> NextGen branch, we don't think there is any upside in porting this code into
> portions of mainline we expect to discard. All of these JIRAs will be fixed
> in the NextGen MapReduce branch and through there ultimately in trunk
> (assuming the PMC approves the merge).
>
> So at this point it is our intent to not port the JIRAs listed above to
> trunk, but to wait until we merge NextGen into trunk to resolve these issues
> there.  If you are interested in seeing these issues ported to mainline, let
> us know.  We are happy to help review your patches and explain context to
> anyone who is interested in doing this work.
>
> Arun and Eric
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-04-07 Thread Arun C Murthy


On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote:


As the final installment in this process, I've started a discussion on
us contributing a re-factor of Map-Reduce in 
https://issues.apache.org/jira/browse/MAPREDUCE-279
.




Hi Folks,

We wanted to share our thoughts around the co-development of the  
NextGen MapReduce branch (Jira MR-279), maintaining the branch-0.20- 
security and merging the work on the security branch with trunk.   
We've concluded that it does not make sense for us to port a very  
small subset of the work from the branch-0.20-security to the Hadoop  
mainline.  The JIRAs we don't plan to port all effect areas of the  
mainline that are going to be replaced by work in the NextGen  
MapReduce branch (http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/ 
).


We've been working on the NextGen MapReduce branch (MAPREDUCE-279)  
within Apache for a while now and are excited about it's progress.  We  
think that this branch will be a huge improvement in scalability,  
performance and functionality.  We are now confident that we can get  
it ready for release in in the next few months.  We believe that the  
next major release of Apache Hadoop we will test at Yahoo will include  
the work in this branch and we are committed to merging the NextGen  
branch into the mainline after the PMC approves the merge.


Meanwhile, we have continued to find and fix bugs on branch-0.20- 
security and have been working to port that work into the Hadoop  
mainline.  Most of this work is done and we've also brought all the  
patches in from our github branch into apache subversion, so that it  
is easy for everyone to see the work remaining.  What we've found is  
that some of the work in branch-0.20-security is in code sections that  
have been completely replaced / refactored in the NextGen MapReduce  
branch.  Since we are committed to the NextGen branch, we don't think  
there is any upside in porting this code into portions of mainline we  
expect to discard. All of these JIRAs will be fixed in the NextGen  
MapReduce branch and through there ultimately in trunk (assuming the  
PMC approves the merge).


So at this point it is our intent to not port the JIRAs listed above  
to trunk, but to wait until we merge NextGen into trunk to resolve  
these issues there.  If you are interested in seeing these issues  
ported to mainline, let us know.  We are happy to help review your  
patches and explain context to anyone who is interested in doing this  
work.


Arun and Eric


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-14 Thread Arun C Murthy


On Feb 11, 2011, at 2:56 PM, Owen O'Malley wrote:



On Jan 31, 2011, at 7:27 PM, Eric Baldeschwieler wrote:


Unfortunately, we now have to sort out how to contribute several
person-years worth of work to Apache to let us unwind the Yahoo! git
repositories.  We currently run two lines of Hadoop development, our
sustaining program (hadoop-0.20-sustaining) and hadoop-future.


I also plan to start pushing the hadoop-future work into a branch
called yahoo-merge(?)  as individual commits from our internal git
repository. The goal of creating the branch is to enable faster review
and discussion. These patches will be individually run through the
jira, review, and commit process to be added to trunk.


As the final installment in this process, I've started a discussion on  
us contributing a re-factor of Map-Reduce in https://issues.apache.org/jira/browse/MAPREDUCE-279 
.


We have a prototype we'd like to commit to a branch soon, where we  
look forward to feedback. From there on, we would love to collaborate  
to get it committed to trunk.


thanks,
Arun




Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-11 Thread Owen O'Malley


On Jan 31, 2011, at 7:27 PM, Eric Baldeschwieler wrote:

Unfortunately, we now have to sort out how to contribute several  
person-years worth of work to Apache to let us unwind the Yahoo! git  
repositories.  We currently run two lines of Hadoop development, our  
sustaining program (hadoop-0.20-sustaining) and hadoop-future.


As Eric mentioned, we have several person years worth of development  
to contribute to Apache. Arun has started that process by creating the  
branch-0.20-security branch. I plan on pushing the individual patches  
to branch-0.20-security-patches and then when it is identical with  
Arun's branch, I'll rename mine to branch-0.20-security.


I also plan to start pushing the hadoop-future work into a branch  
called yahoo-merge(?)  as individual commits from our internal git  
repository. The goal of creating the branch is to enable faster review  
and discussion. These patches will be individually run through the  
jira, review, and commit process to be added to trunk.


-- Owen


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-01 Thread Todd Papaioannou
Yes. We have been and continue to be firm believers in Apache and the value of 
Open Source software, as you can see from our track record to date of 
contributing heavily to Hadoop and donating Pig, ZooKeeper, Avro, etc. We are 
excited about their potential and we hope others will find them useful too.

ToddP

On 1/31/11 7:44 PM, "Jeff Hammerbacher" 
mailto:ham...@cloudera.com>> wrote:

Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as
well?

On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
mailto:eri...@yahoo-inc.com>>wrote:

Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has decided to
discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache
Hadoop.  We plan to remove all references to a Yahoo distribution from our
website (developer.yahoo.com/hadoop), close our github repo (
yahoo.github.com/hadoop-common) and focus on working more closely with the
Apache community.  Our intent is to return to helping Apache produce binary
releases of Apache Hadoop that are so bullet proof that Yahoo and other
production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to produce
binary Apache Hadoop releases that the entire community used on their
clusters.As the community grew, we have experiment with using the
"Yahoo! Distribution of Hadoop" as the vehicle to share our work.
  Unfortunately, Apache is no longer the obvious place to go for Hadoop
releases.  The Yahoo! team wants to return to a world where anyone can
download and directly use releases of Hadoop from Apache.  We want to
contribute to the stabilization and testing of those releases.  We also want
to share our regular program of sustaining engineering that backports minor
feature enhancements into new dot releases on a regular basis, so that the
world sees regular improvements coming from Apache every few months, not
years.

Recently the Apache Hadoop community has been very turbulent.  Over the
last few months we have been developing Hadoop enhancements in our internal
git repository while doing a complete review of our options. Our commitment
to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
but the future of the "Yahoo distribution of Hadoop" was far from clear.
  We've concluded that focusing on Apache Hadoop is the way forward.  We
believe that more focus on communicating our goals to the Apache Hadoop
community, and more willingness to compromise on how we get to those goals,
will help us get back to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several
person-years worth of work to Apache to let us unwind the Yahoo! git
repositories.  We currently run two lines of Hadoop development, our
sustaining program (hadoop-0.20-sustaining) and hadoop-future.
  Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on
Yahoo's 40,000 nodes.  It contains a series of fixes and enhancements that
are all backwards compatible with our "Hadoop 0.20 with security".  It is
our most stable and high performance release of Hadoop ever.  We've expended
a lot of energy finding and fixing bugs in it this year. We have initiated
the process of contributing this work to Apache in the branch:
hadoop/common/branches/branch-0.20-security.  We've proposed calling this
the 20.100 release.  Once folks have had a chance to try this out and we've
had a chance to respond to their feedback, we plan to create 20.100 release
candidates and ask the community to vote on making them Apache releases.

Hadoop-future is our new feature branch.  We are working on a set of new
features for Hadoop to improve its availability, scalability and
interoperability to make Hadoop more usable in mission critical deployments.
You're going to see another burst of email activity from us as we work to
get hadoop-future patches socialized, reviewed and checked in.  These bulk
checkins are exceptional.  They are the result of us striving to be more
transparent.  Once we've merged our hadoop-future and hadoop-0.20-sustaining
work back into Apache, folks can expect us to return to our regular
development cadence.  Looking forward, we plan to socialize our roadmaps
regularly, actively synchronize our work with other active Hadoop
contributors and develop our code collaboratively, directly in Apache.

In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop"
is a commitment to working more effectively with the Apache Hadoop
community.  Our goal is to make Apache Hadoop THE open source platform for
big data.

Thanks,

E14

--

PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage per
Hadoop cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

---
PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W



Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-01 Thread Andrew Purtell
> From: Alan Gates 
>
> We will be proposing Howl as an Incubator project soon.

That would be excellent.

Best regards,

- Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)



  


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-01 Thread Alan Gates

We will be proposing Howl as an Incubator project soon.

Alan.

On Jan 31, 2011, at 7:44 PM, Jeff Hammerbacher wrote:

Excellent news! Will you also make Howl, Oozie, and Yarn Apache  
projects as

well?

On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
wrote:


Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has  
decided to
discontinue the  "The Yahoo Distribution of Hadoop" and focus on  
Apache
Hadoop.  We plan to remove all references to a Yahoo distribution  
from our

website (developer.yahoo.com/hadoop), close our github repo (
yahoo.github.com/hadoop-common) and focus on working more closely  
with the
Apache community.  Our intent is to return to helping Apache  
produce binary
releases of Apache Hadoop that are so bullet proof that Yahoo and  
other

production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to  
produce

binary Apache Hadoop releases that the entire community used on their
clusters.As the community grew, we have experiment with using the
"Yahoo! Distribution of Hadoop" as the vehicle to share our work.
Unfortunately, Apache is no longer the obvious place to go for Hadoop
releases.  The Yahoo! team wants to return to a world where anyone  
can

download and directly use releases of Hadoop from Apache.  We want to
contribute to the stabilization and testing of those releases.  We  
also want
to share our regular program of sustaining engineering that  
backports minor
feature enhancements into new dot releases on a regular basis, so  
that the
world sees regular improvements coming from Apache every few  
months, not

years.

Recently the Apache Hadoop community has been very turbulent.  Over  
the
last few months we have been developing Hadoop enhancements in our  
internal
git repository while doing a complete review of our options. Our  
commitment
to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd) 
,
but the future of the "Yahoo distribution of Hadoop" was far from  
clear.
We've concluded that focusing on Apache Hadoop is the way forward.   
We
believe that more focus on communicating our goals to the Apache  
Hadoop
community, and more willingness to compromise on how we get to  
those goals,

will help us get back to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several
person-years worth of work to Apache to let us unwind the Yahoo! git
repositories.  We currently run two lines of Hadoop development, our
sustaining program (hadoop-0.20-sustaining) and hadoop-future.
Hadoop-0.20-sustaining is the stable version of Hadoop we currently  
run on
Yahoo's 40,000 nodes.  It contains a series of fixes and  
enhancements that
are all backwards compatible with our "Hadoop 0.20 with security".   
It is
our most stable and high performance release of Hadoop ever.  We've  
expended
a lot of energy finding and fixing bugs in it this year. We have  
initiated

the process of contributing this work to Apache in the branch:
hadoop/common/branches/branch-0.20-security.  We've proposed  
calling this
the 20.100 release.  Once folks have had a chance to try this out  
and we've
had a chance to respond to their feedback, we plan to create 20.100  
release
candidates and ask the community to vote on making them Apache  
releases.


Hadoop-future is our new feature branch.  We are working on a set  
of new

features for Hadoop to improve its availability, scalability and
interoperability to make Hadoop more usable in mission critical  
deployments.
You're going to see another burst of email activity from us as we  
work to
get hadoop-future patches socialized, reviewed and checked in.   
These bulk
checkins are exceptional.  They are the result of us striving to be  
more
transparent.  Once we've merged our hadoop-future and hadoop-0.20- 
sustaining

work back into Apache, folks can expect us to return to our regular
development cadence.  Looking forward, we plan to socialize our  
roadmaps

regularly, actively synchronize our work with other active Hadoop
contributors and develop our code collaboratively, directly in  
Apache.


In summary, our decision to discontinue the "Yahoo! Distribution of  
Hadoop"

is a commitment to working more effectively with the Apache Hadoop
community.  Our goal is to make Apache Hadoop THE open source  
platform for

big data.

Thanks,

E14

--

PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage  
per

Hadoop cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

---
PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W




Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-02-01 Thread Ian Holsman
Congratulations Eric.
this is fantastic news.
On Jan 31, 2011, at 10:27 PM, Eric Baldeschwieler wrote:

> Hi Folks,
> 
> I'm pleased to announce that after some reflection, Yahoo! has decided to 
> discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache 
> Hadoop.  We plan to remove all references to a Yahoo distribution from our 
> website (developer.yahoo.com/hadoop), close our github repo 
> (yahoo.github.com/hadoop-common) and focus on working more closely with the 
> Apache community.  Our intent is to return to helping Apache produce binary 
> releases of Apache Hadoop that are so bullet proof that Yahoo and other 
> production Hadoop users can run them unpatched on their clusters.
> 
> Until Hadoop 0.20, Yahoo committers worked as release masters to produce 
> binary Apache Hadoop releases that the entire community used on their 
> clusters.As the community grew, we have experiment with using the "Yahoo! 
> Distribution of Hadoop" as the vehicle to share our work.  Unfortunately, 
> Apache is no longer the obvious place to go for Hadoop releases.  The Yahoo! 
> team wants to return to a world where anyone can download and directly use 
> releases of Hadoop from Apache.  We want to contribute to the stabilization 
> and testing of those releases.  We also want to share our regular program of 
> sustaining engineering that backports minor feature enhancements into new dot 
> releases on a regular basis, so that the world sees regular improvements 
> coming from Apache every few months, not years.
> 
> Recently the Apache Hadoop community has been very turbulent.  Over the last 
> few months we have been developing Hadoop enhancements in our internal git 
> repository while doing a complete review of our options. Our commitment to 
> open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but 
> the future of the "Yahoo distribution of Hadoop" was far from clear.  We've 
> concluded that focusing on Apache Hadoop is the way forward.  We believe that 
> more focus on communicating our goals to the Apache Hadoop community, and 
> more willingness to compromise on how we get to those goals, will help us get 
> back to making Hadoop even better.
> 
> Unfortunately, we now have to sort out how to contribute several person-years 
> worth of work to Apache to let us unwind the Yahoo! git repositories.  We 
> currently run two lines of Hadoop development, our sustaining program 
> (hadoop-0.20-sustaining) and hadoop-future.  Hadoop-0.20-sustaining is the 
> stable version of Hadoop we currently run on Yahoo's 40,000 nodes.  It 
> contains a series of fixes and enhancements that are all backwards compatible 
> with our "Hadoop 0.20 with security".  It is our most stable and high 
> performance release of Hadoop ever.  We've expended a lot of energy finding 
> and fixing bugs in it this year. We have initiated the process of 
> contributing this work to Apache in the branch: 
> hadoop/common/branches/branch-0.20-security.  We've proposed calling this the 
> 20.100 release.  Once folks have had a chance to try this out and we've had a 
> chance to respond to their feedback, we plan to create 20.100 release 
> candidates and ask the community to vote on making them Apache releases. 
> 
> Hadoop-future is our new feature branch.  We are working on a set of new 
> features for Hadoop to improve its availability, scalability and 
> interoperability to make Hadoop more usable in mission critical deployments. 
> You're going to see another burst of email activity from us as we work to get 
> hadoop-future patches socialized, reviewed and checked in.  These bulk 
> checkins are exceptional.  They are the result of us striving to be more 
> transparent.  Once we've merged our hadoop-future and hadoop-0.20-sustaining 
> work back into Apache, folks can expect us to return to our regular 
> development cadence.  Looking forward, we plan to socialize our roadmaps 
> regularly, actively synchronize our work with other active Hadoop 
> contributors and develop our code collaboratively, directly in Apache.
> 
> In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" 
> is a commitment to working more effectively with the Apache Hadoop community. 
>  Our goal is to make Apache Hadoop THE open source platform for big data.
> 
> Thanks,
> 
> E14
> 
> --
> 
> PS Here is a draft list of key features in hadoop-future:
> 
> * HDFS-1052 - Federation, the ability to support much more storage per Hadoop 
> cluster.
> 
> * HADOOP-6728 - A the new metrics framework
> 
> * MAPREDUCE-1220 - Optimizations for small jobs
> 
> ---
> PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W



Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-01-31 Thread Jeff Hammerbacher
Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as
well?

On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
wrote:

> Hi Folks,
>
> I'm pleased to announce that after some reflection, Yahoo! has decided to
> discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache
> Hadoop.  We plan to remove all references to a Yahoo distribution from our
> website (developer.yahoo.com/hadoop), close our github repo (
> yahoo.github.com/hadoop-common) and focus on working more closely with the
> Apache community.  Our intent is to return to helping Apache produce binary
> releases of Apache Hadoop that are so bullet proof that Yahoo and other
> production Hadoop users can run them unpatched on their clusters.
>
> Until Hadoop 0.20, Yahoo committers worked as release masters to produce
> binary Apache Hadoop releases that the entire community used on their
> clusters.As the community grew, we have experiment with using the
> "Yahoo! Distribution of Hadoop" as the vehicle to share our work.
>  Unfortunately, Apache is no longer the obvious place to go for Hadoop
> releases.  The Yahoo! team wants to return to a world where anyone can
> download and directly use releases of Hadoop from Apache.  We want to
> contribute to the stabilization and testing of those releases.  We also want
> to share our regular program of sustaining engineering that backports minor
> feature enhancements into new dot releases on a regular basis, so that the
> world sees regular improvements coming from Apache every few months, not
> years.
>
> Recently the Apache Hadoop community has been very turbulent.  Over the
> last few months we have been developing Hadoop enhancements in our internal
> git repository while doing a complete review of our options. Our commitment
> to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
> but the future of the "Yahoo distribution of Hadoop" was far from clear.
>  We've concluded that focusing on Apache Hadoop is the way forward.  We
> believe that more focus on communicating our goals to the Apache Hadoop
> community, and more willingness to compromise on how we get to those goals,
> will help us get back to making Hadoop even better.
>
> Unfortunately, we now have to sort out how to contribute several
> person-years worth of work to Apache to let us unwind the Yahoo! git
> repositories.  We currently run two lines of Hadoop development, our
> sustaining program (hadoop-0.20-sustaining) and hadoop-future.
>  Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on
> Yahoo's 40,000 nodes.  It contains a series of fixes and enhancements that
> are all backwards compatible with our "Hadoop 0.20 with security".  It is
> our most stable and high performance release of Hadoop ever.  We've expended
> a lot of energy finding and fixing bugs in it this year. We have initiated
> the process of contributing this work to Apache in the branch:
> hadoop/common/branches/branch-0.20-security.  We've proposed calling this
> the 20.100 release.  Once folks have had a chance to try this out and we've
> had a chance to respond to their feedback, we plan to create 20.100 release
> candidates and ask the community to vote on making them Apache releases.
>
> Hadoop-future is our new feature branch.  We are working on a set of new
> features for Hadoop to improve its availability, scalability and
> interoperability to make Hadoop more usable in mission critical deployments.
> You're going to see another burst of email activity from us as we work to
> get hadoop-future patches socialized, reviewed and checked in.  These bulk
> checkins are exceptional.  They are the result of us striving to be more
> transparent.  Once we've merged our hadoop-future and hadoop-0.20-sustaining
> work back into Apache, folks can expect us to return to our regular
> development cadence.  Looking forward, we plan to socialize our roadmaps
> regularly, actively synchronize our work with other active Hadoop
> contributors and develop our code collaboratively, directly in Apache.
>
> In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop"
> is a commitment to working more effectively with the Apache Hadoop
> community.  Our goal is to make Apache Hadoop THE open source platform for
> big data.
>
> Thanks,
>
> E14
>
> --
>
> PS Here is a draft list of key features in hadoop-future:
>
> * HDFS-1052 - Federation, the ability to support much more storage per
> Hadoop cluster.
>
> * HADOOP-6728 - A the new metrics framework
>
> * MAPREDUCE-1220 - Optimizations for small jobs
>
> ---
> PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W


[ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-01-31 Thread Eric Baldeschwieler
Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has decided to 
discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. 
 We plan to remove all references to a Yahoo distribution from our website 
(developer.yahoo.com/hadoop), close our github repo 
(yahoo.github.com/hadoop-common) and focus on working more closely with the 
Apache community.  Our intent is to return to helping Apache produce binary 
releases of Apache Hadoop that are so bullet proof that Yahoo and other 
production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary 
Apache Hadoop releases that the entire community used on their clusters.As 
the community grew, we have experiment with using the "Yahoo! Distribution of 
Hadoop" as the vehicle to share our work.  Unfortunately, Apache is no longer 
the obvious place to go for Hadoop releases.  The Yahoo! team wants to return 
to a world where anyone can download and directly use releases of Hadoop from 
Apache.  We want to contribute to the stabilization and testing of those 
releases.  We also want to share our regular program of sustaining engineering 
that backports minor feature enhancements into new dot releases on a regular 
basis, so that the world sees regular improvements coming from Apache every few 
months, not years.

Recently the Apache Hadoop community has been very turbulent.  Over the last 
few months we have been developing Hadoop enhancements in our internal git 
repository while doing a complete review of our options. Our commitment to open 
sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but the 
future of the "Yahoo distribution of Hadoop" was far from clear.  We've 
concluded that focusing on Apache Hadoop is the way forward.  We believe that 
more focus on communicating our goals to the Apache Hadoop community, and more 
willingness to compromise on how we get to those goals, will help us get back 
to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several person-years 
worth of work to Apache to let us unwind the Yahoo! git repositories.  We 
currently run two lines of Hadoop development, our sustaining program 
(hadoop-0.20-sustaining) and hadoop-future.  Hadoop-0.20-sustaining is the 
stable version of Hadoop we currently run on Yahoo's 40,000 nodes.  It contains 
a series of fixes and enhancements that are all backwards compatible with our 
"Hadoop 0.20 with security".  It is our most stable and high performance 
release of Hadoop ever.  We've expended a lot of energy finding and fixing bugs 
in it this year. We have initiated the process of contributing this work to 
Apache in the branch: hadoop/common/branches/branch-0.20-security.  We've 
proposed calling this the 20.100 release.  Once folks have had a chance to try 
this out and we've had a chance to respond to their feedback, we plan to create 
20.100 release candidates and ask the community to vote on making them Apache 
releases. 

Hadoop-future is our new feature branch.  We are working on a set of new 
features for Hadoop to improve its availability, scalability and 
interoperability to make Hadoop more usable in mission critical deployments. 
You're going to see another burst of email activity from us as we work to get 
hadoop-future patches socialized, reviewed and checked in.  These bulk checkins 
are exceptional.  They are the result of us striving to be more transparent.  
Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into 
Apache, folks can expect us to return to our regular development cadence.  
Looking forward, we plan to socialize our roadmaps regularly, actively 
synchronize our work with other active Hadoop contributors and develop our code 
collaboratively, directly in Apache.

In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is 
a commitment to working more effectively with the Apache Hadoop community.  Our 
goal is to make Apache Hadoop THE open source platform for big data.

Thanks,

E14

--

PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage per Hadoop 
cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

---
PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W