Re: Large feature development

2012-09-02 Thread Todd Lipcon
Hey Arun,

First, let me apologize if my email came off as a personal "snipe"
against the project or anyone working on it. I know the team has been
hard at work for multiple years now on the project, and I certainly
don't mean to denigrate the work anyone has done. I also agree that
the improvements made possible by YARN are tremendously important, and
I've expressed this opinion both online and in interviews with
analysts, etc.

But, I'll stand by my point that YARN is at this point more "alpha"
than HDFS2. You brought up two bugs in the HDFS2 code base as examples
of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a
messy bug, but had nothing to do with HA, the edit log rewrite, or any
other of the changes being discussed in the thread. In fact, the bug
has been there since the "beginning of time", and is in fact present
in Hadoop 1.0.x as well (which is why the JIRA is still open). You
simply need to pass a non-canonicalized path by the Path(URI)
constructor, and you'll see the same behavior in every release
including 1.0.x, 0.20.x, or earlier. The reason it shows up more often
in Hadoop 2 was actually due to the FsShell rewrite -- not any changes
in HDFS itself, and certainly not related to HA like you've implied
here.

The other bug causes blocksBeingWritten to disappear upon upgrade.
This, also, had nothing to do with any of the features being discussed
in this thread, and in fact only impacts a cluster which is taken down
_uncleanly_ prior to an upgrade. Upon starting the upgraded cluster,
the user would be alerted to the missing blocks and could rollback
with no lost data. So, while it should be fixed (and has been), I
wouldn't consider it particularly frightening. Most users I am aware
of do a "clean" shutdown of services like HBase before trying to
upgrade their cluster, and, worst case, they would see the issue
immediately after the upgrade and perform a rollback with no adverse
effects.

In branch-1, however, I've seen other bugs that I'd consider much more
scary. Two in particular come to mind and together represent the vast
majority of cases in which we've seen customers experience data
corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1
only, and never present in Hadoop 2 due to the "edit log rewrite"
project (HDFS-1073).

So, at risk of this thread just becoming a laundry list of bugs that
have existed in HDFS, or a list of bugs in YARN, I'll summarize: I
still think that YARN is "alpha" and HDFS 2 is at least as "stable" as
Hadoop 1.0. We have customers running it for production workloads, in
multi-rack clusters, with great success. But this has nothing to do
with this thread at hand, so I'll raise the question of
alpha/beta/stable labeling in the context of our next release vote,
and hope we can go back to the more fruitful discussion of how to
encourage large feature development while maintaining stability.

Thanks
-Todd

On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy  wrote:
> Eli,
>
> On Sep 2, 2012, at 1:01 PM, Eli Collins  wrote:
>
>> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
>>> Todd,
>>>
>>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>>>
 I'd actually contend that YARN was merged too early. I have yet to see
 anyone running YARN in production, and it's holding up the "Stable"
 moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
 I'm seeing fewer issues in our customers running Hadoop HDFS 2
 compared to Hadoop 1-derived code.
>>>
>>> You know I respect you a ton, but I'm very saddened to see you perpetuate 
>>> this FUD on our public lists. I expected better, particularly when everyone 
>>> is working towards the same goals of advancing Hadoop-2. This sniping on 
>>> other members doing work is, um, I'll just stop here rather than regret 
>>> later.
>> 2. HDFS is more mature than YARN. Not a surprise given that we all
>> agree YARN is alpha, and a much newer project than HDFS that hasn't
>> yet been deployed in production environments yet (to my knowledge).
>
> Let's focus on the ground reality here.
>
> Please read my (or Rajiv's) message again about YARN's current
> stability and how much it's baked, it's deployment plans to a very
> large cluster in a few *days*. Or, talk to the people developing,
> testing and supporting these customers and clusters.
>
> I'll repeat - YARN has clearly baked much more than HDFS HA given
> the basic bugs (upgrade, edit logs corruption etc.) we've seen after
> being declared *done*; but then we just disagree since clearly I'm
> more conservative. Also, we need to be more conservative wrt HDFS -
> but then what would I know...
>
> I'll admit it's hard to discuss with someone (or a collective) who
> just repeat themselves. Plus, I broke my own rule about email this
> weekend - so, I'll try harder.
>
> Arun



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Large feature development

2012-09-02 Thread Arun Murthy
Eli,

On Sep 2, 2012, at 1:01 PM, Eli Collins  wrote:

> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
>> Todd,
>>
>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>>
>>> I'd actually contend that YARN was merged too early. I have yet to see
>>> anyone running YARN in production, and it's holding up the "Stable"
>>> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
>>> I'm seeing fewer issues in our customers running Hadoop HDFS 2
>>> compared to Hadoop 1-derived code.
>>
>> You know I respect you a ton, but I'm very saddened to see you perpetuate 
>> this FUD on our public lists. I expected better, particularly when everyone 
>> is working towards the same goals of advancing Hadoop-2. This sniping on 
>> other members doing work is, um, I'll just stop here rather than regret 
>> later.
> 2. HDFS is more mature than YARN. Not a surprise given that we all
> agree YARN is alpha, and a much newer project than HDFS that hasn't
> yet been deployed in production environments yet (to my knowledge).

Let's focus on the ground reality here.

Please read my (or Rajiv's) message again about YARN's current
stability and how much it's baked, it's deployment plans to a very
large cluster in a few *days*. Or, talk to the people developing,
testing and supporting these customers and clusters.

I'll repeat - YARN has clearly baked much more than HDFS HA given
the basic bugs (upgrade, edit logs corruption etc.) we've seen after
being declared *done*; but then we just disagree since clearly I'm
more conservative. Also, we need to be more conservative wrt HDFS -
but then what would I know...

I'll admit it's hard to discuss with someone (or a collective) who
just repeat themselves. Plus, I broke my own rule about email this
weekend - so, I'll try harder.

Arun


Re: Large feature development

2012-09-02 Thread Eli Collins
On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
> Todd,
>
> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>
>> I'd actually contend that YARN was merged too early. I have yet to see
>> anyone running YARN in production, and it's holding up the "Stable"
>> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
>> I'm seeing fewer issues in our customers running Hadoop HDFS 2
>> compared to Hadoop 1-derived code.
>
> You know I respect you a ton, but I'm very saddened to see you perpetuate 
> this FUD on our public lists. I expected better, particularly when everyone 
> is working towards the same goals of advancing Hadoop-2. This sniping on 
> other members doing work is, um, I'll just stop here rather than regret later.

Todd is just saying that:

1. HDFS v2 has fewer critical bugs than v1  (mostly thanks to the edit
log rewrite, which aside from HA was motivated by all the quality
issues the v1 code has had)

2. HDFS is more mature than YARN. Not a surprise given that we all
agree YARN is alpha, and a much newer project than HDFS that hasn't
yet been deployed in production environments yet (to my knowledge).

I don't read this as a snipe against anyone coding on Hadoop, it's
just that the two sub-projects are at different stages in their life
and development.

Thanks,
Eli


Re: Large feature development

2012-09-02 Thread Eli Collins
On Sun, Sep 2, 2012 at 7:58 AM, Steve Loughran  wrote:
> On 1 September 2012 09:20, Todd Lipcon  wrote:
>
>> Thanks for starting this thread, Steve. I think your points below are
>> good. I've snipped most of your comment and will reply inline to one
>> bit below:
>>
>> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
>>  wrote:
>>
>>
>> >
>> > How then do we get (a) more dev projects working and integrated by the
>> > current committers, and (b) a process in which people who are not yet
>> > contributors/committers can develop non-trivial changes to the project
>> in a
>> > way that it is done with the knowledge, support and mentorship of the
>> rest
>> > of the community?
>>
>>
> Both HDFS2 and MRv2 are in trunk, therefore I consider them successes.
>
>
>> Here's one proposal, making use of git as an easy way to allow
>> non-committers to "commit" code while still tracking development in
>> the usual places:
>>
>
> This is effectively what people do. I'm less worried about the code side of
> things than the integration and mentoring
>
>
>> - Upon anyone's request, we create a new "Version" tag in JIRA.
>>
>
> -1. There are enough versions. There is a "tag" field in JIRA for precisely
> this purpose
>
>
>> - The developers create an umbrella JIRA for the project, and file the
>> individual work items as subtasks (either up front, or as they are
>> developed if using a more iterative model)
>>
>
> as today
>
>
>> - On the umbrella, they add a pointer to a git branch to be used as
>> the staging area for the branch. As they develop each subtask, they
>> can use the JIRA to discuss the development like they would with a
>> normally committed JIRA, but when they feel it is ready to go (not
>> requiring a +1 from any committer) they commit to their git branch
>> instead of the SVN repo.
>>
>
> some integration w/ jenkins and pull testing would be good here
>
>
>> - When the branch is ready to merge, they can call a merge vote, which
>> requires +1 from 3 committers, same as a branch being proposed by an
>> existing committer. A committer would then use git-svn to merge their
>> branch commit-by-commit, or if it is less extensive, simply generate a
>> single big patch to commit into SVN.
>>
>> My thinking is that this would provide a low-friction way for people
>> to collaborate with the community and develop in the open, without
>> having to work closely with any committer to review every individual
>> subtask.
>>
>> Another alternative, if people are reluctant to use git, would be to
>> add a "sandbox/" repository inside our SVN, and hand out commit bit to
>> branches inside there without any PMC vote. Anyone interested in
>> contributing could request a branch in the sandbox, and be granted
>> access as soon as they get an apache SVN account.
>>
>>
> I don't see the technical issues with how the merge is done as the main
> problem.
>
> The barriers to getting your stuff in are
> 1. getting people to care enough to help develop the feature -mentorship,
> collaborative development.
> 2. getting incremental parts in to avoid the continual
> merge-regression-test hell that you go through if you are trying to keep a
> separate branch alive. It's not the technical aspects of the merge so much
> as the need to run all the hadoop tests and your own test suite, and track
> down whether a failure is a regression in -trunk or something in your code.
>
> Jun's patch is an example of this situation. We haven't seen the effort he
> and his colleagues have done with merge and test, but I'm confident it's
> been there. What they now have is a "big bang" class of patch which is so
> big that anyone reviewing it would have to spend a couple of weeks going
> through the codebase trying to understand it. Which as we all know means
> two weeks not doing all the things you are committed to doing.
>
> We know it's there, we know it's current -so how to use this as an exercise
> in something to pull in incrementally?

Jun's patches from HADOOP-8468 (which were developed on a private
github repo) are being pulled in incrementally into trunk, there's no
feature branch (which I think would have been a better route but at
least the current approach has not prevented some progress).

All the recent examples of features that I can think of that have been
developed upstream first at Apache on feature branches have gone well.

Thanks,
Eli


Re: Large feature development

2012-09-02 Thread Steve Loughran
On 1 September 2012 09:20, Todd Lipcon  wrote:

> Thanks for starting this thread, Steve. I think your points below are
> good. I've snipped most of your comment and will reply inline to one
> bit below:
>
> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
>  wrote:
>
>
> >
> > How then do we get (a) more dev projects working and integrated by the
> > current committers, and (b) a process in which people who are not yet
> > contributors/committers can develop non-trivial changes to the project
> in a
> > way that it is done with the knowledge, support and mentorship of the
> rest
> > of the community?
>
>
Both HDFS2 and MRv2 are in trunk, therefore I consider them successes.


> Here's one proposal, making use of git as an easy way to allow
> non-committers to "commit" code while still tracking development in
> the usual places:
>

This is effectively what people do. I'm less worried about the code side of
things than the integration and mentoring


> - Upon anyone's request, we create a new "Version" tag in JIRA.
>

-1. There are enough versions. There is a "tag" field in JIRA for precisely
this purpose


> - The developers create an umbrella JIRA for the project, and file the
> individual work items as subtasks (either up front, or as they are
> developed if using a more iterative model)
>

as today


> - On the umbrella, they add a pointer to a git branch to be used as
> the staging area for the branch. As they develop each subtask, they
> can use the JIRA to discuss the development like they would with a
> normally committed JIRA, but when they feel it is ready to go (not
> requiring a +1 from any committer) they commit to their git branch
> instead of the SVN repo.
>

some integration w/ jenkins and pull testing would be good here


> - When the branch is ready to merge, they can call a merge vote, which
> requires +1 from 3 committers, same as a branch being proposed by an
> existing committer. A committer would then use git-svn to merge their
> branch commit-by-commit, or if it is less extensive, simply generate a
> single big patch to commit into SVN.
>
> My thinking is that this would provide a low-friction way for people
> to collaborate with the community and develop in the open, without
> having to work closely with any committer to review every individual
> subtask.
>
> Another alternative, if people are reluctant to use git, would be to
> add a "sandbox/" repository inside our SVN, and hand out commit bit to
> branches inside there without any PMC vote. Anyone interested in
> contributing could request a branch in the sandbox, and be granted
> access as soon as they get an apache SVN account.
>
>
I don't see the technical issues with how the merge is done as the main
problem.

The barriers to getting your stuff in are
1. getting people to care enough to help develop the feature -mentorship,
collaborative development.
2. getting incremental parts in to avoid the continual
merge-regression-test hell that you go through if you are trying to keep a
separate branch alive. It's not the technical aspects of the merge so much
as the need to run all the hadoop tests and your own test suite, and track
down whether a failure is a regression in -trunk or something in your code.

Jun's patch is an example of this situation. We haven't seen the effort he
and his colleagues have done with merge and test, but I'm confident it's
been there. What they now have is a "big bang" class of patch which is so
big that anyone reviewing it would have to spend a couple of weeks going
through the codebase trying to understand it. Which as we all know means
two weeks not doing all the things you are committed to doing.

We know it's there, we know it's current -so how to use this as an exercise
in something to pull in incrementally?

-Steve