Re: [VOTE] Abandon hdfsproxy HDFS contrib

Nigel Daley Sat, 09 Apr 2011 22:01:44 -0700

AFAICT, Owen was the one to -1 removal of HDFS Proxy.  Owen, are you guys 
maintaining this?


Cheers,
Nige

On Apr 4, 2011, at 12:19 PM, Todd Lipcon wrote:

> Could those of you who -1ed the removal of HDFS Proxy please look into the
> test that has been failing our Hudson build for the last several months:
> https://issues.apache.org/jira/browse/HDFS-1666
> 
> <https://issues.apache.org/jira/browse/HDFS-1666>It is one thing to say that
> we "should" maintain a piece of code, but it's another to actually maintain
> it. In my mind, part of maintaining a project involves addressing consistent
> test failures as high priority items.
> 
> -Todd
> 
> On Tue, Feb 22, 2011 at 9:27 PM, Nigel Daley <nda...@mac.com> wrote:
> 
>> For closure, this vote fails due to a couple binding -1 votes.
>> 
>> Nige
>> 
>> On Feb 18, 2011, at 4:46 AM, Eric Baldeschwieler wrote:
>> 
>>> Hi Bernd,
>>> 
>>> Apache Hadoop is about scale. Most clusters will always be small, but
>> Hadoop is going mainstream precisely because it scales to huge data and
>> cluster sizes.
>>> 
>>> There are lots of systems that work well on 10 node clusters. People
>> select   Hadoop because they are confident that as their business / problem
>> grows, Hadoop can grow with it.
>>> 
>>> ---
>>> E14 - via iPhone
>>> 
>>> On Feb 17, 2011, at 7:25 AM, "Bernd Fondermann" <
>> bernd.fonderm...@googlemail.com> wrote:
>>> 
>>>> On Thu, Feb 17, 2011 at 14:58, Ian Holsman <had...@holsman.net> wrote:
>>>>> Hi Bernd.
>>>>> 
>>>>> On Feb 17, 2011, at 7:43 AM, Bernd Fondermann wrote:
>>>>>> 
>>>>>> We have the very unfortunate situation here at Hadoop where Apache
>>>>>> Hadoop is not the primary and foremost place of Hadoop development.
>>>>>> Instead, code is developed internally at Yahoo and then contributed in
>>>>>> (smaller or larger) chunks to Hadoop.
>>>>> 
>>>>> This has been the situation in the past,
>>>>> but as you can see in the last month, this has changed.
>>>>> 
>>>>> Yahoo! has publicly committed to move their development into the main
>> code base, and you can see they have started doing this with the 20.100
>> branch,
>>>>> and their recent commits to trunk.
>>>>> Combine this with Nige taking on the 0.22 release branch, (and
>> sheperding it into a stable release) and I think we have are addressing your
>> concerns.
>>>>> 
>>>>> They have also started bringing the discussions back on the list, see
>> the recent discussion about Jobtracker-nextgen Arun has re-started in
>> MAPREDUCE-279.
>>>>> 
>>>>> I'm not saying it's perfect, but I think the major players understand
>> there is an issue, and they are *ALL* moving in the right direction.
>>>> 
>>>> I enthusiastically would like to see your optimism be verified.
>>>> Maybe I'm misreading the statements issued publicly, but I don't think
>>>> that this is fully understood. I agree though that it's a move into
>>>> the right direction.
>>>> 
>>>>>> This is open source development upside down.
>>>>>> It is not ok for people to diff ASF svn against their internal code
>>>>>> and provide the diff as a patch without reviewing IP first for every
>>>>>> line of code changed.
>>>>>> For larger chunks I'd suggest to even go via the Incubator IP
>> clearance process.
>>>>>> Only then will we force committers to primarily work here in the open
>>>>>> and return to what I'd consider a healthy project.
>>>>>> 
>>>>>> To be honest: Hadoop is in the process of falling apart.
>>>>>> Contrib Code gets moved out of Apache instead of being maintained
>> here.
>>>>>> Discussions are seldom consense-driven.
>>>>>> Release branches stagnate.
>>>>> 
>>>>> True. releases do take a long time. This is mainly due to it being
>> extremely hard to test and verify that a release is stable.
>>>>> It's not enough to just run the thing on 4 machines, you need at least
>> 50 to test some of the major problems. This requires some serious $ for
>> someone to verify.
>>>> 
>>>> It has been proposed on the list before, IIRC. Don't know how to get
>>>> there, but the project seriously needs access to a cluster of this
>>>> size.
>>>> 
>>>>>> Downstream projects like HBase don't get proper support.
>>>>>> Production setups are made from 3rd party distributions.
>>>>>> Development is not happening here, but elsewhere behind corporate
>> doors.
>>>>>> Discussion about future developments are started on corporate blogs (
>>>>>> 
>> http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/
>>>>>> ) instead of on the proper mailing list.
>>>>>> Hurdles for committing are way too high.
>>>>>> On the bright side, new committers and PMC members are added, this is
>>>>>> an improvement.
>>>>>> 
>>>>>> I'd suggest to move away from relying on large code dumps from
>>>>>> corporations, and move back to the ASF-proven "individual committer
>>>>>> commits on trunk"-model where more committers can get involved.
>>>>>> If that means not to support high end cluster sizes for some months,
>>>>>> well, so be it.
>>>>> 
>>>>>> Average committers cannot run - e.g. test - on high
>>>>>> end cluster sizes. If that would mean they cannot participate, then
>>>>>> the open source project better concentrate on small and medium sized
>>>>>> cluster instead.
>>>>> 
>>>>> 
>>>>> Well.. that's one approach.. but there are several companies out there
>> who rely on apache's hadoop to power their large clusters, so I'd hate to
>> see hadoop become something that only runs well on
>>>>> 10-nodes.. as I don't think that will help anyone either.
>>>> 
>>>> But only looking at high-end scale doesn't help either.
>>>> 
>>>> Lets face the fact that Hadoop is now moving from early adaptors phase
>>>> into a much broader market. I predict that small to medium sized
>>>> clusters will be the majority of Hadoop deployments in a few month
>>>> time. 4000, or even 500 machines is the high-end range. If the open
>>>> source project Hadoop cannot support those users adequately (without
>>>> becoming defunct), the committership might be better off to focus on
>>>> the low-end and medium sized users.
>>>> 
>>>> I'm not suggesting to turn away from the handfull (?) of high-end
>>>> users. They certainly have most valuable input. But also, *they*
>>>> obviously have the resources in terms of larger clusters and
>>>> developers to deal with their specific setups. Obviously, they don't
>>>> need to rely on the open source project to make releases. In fact,
>>>> they *do* work on their own Hadoop derivatives.
>>>> All the other users, the hundreds of boring small cluster users, don't
>>>> have that choice. They *depend* on the open source releases.
>>>> 
>>>> Hadoop is an Apache project, to provide HDFS and MR free of charge to
>>>> the general public. Not only to me - nor to only one or two big
>>>> companies either.
>>>> Focus on all the users.
>>>> 
>>>> Bernd
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: [VOTE] Abandon hdfsproxy HDFS contrib

Reply via email to