Re: Large feature development

2012-09-03 Thread Arun C Murthy
Todd,

On Sep 2, 2012, at 6:12 PM, Todd Lipcon wrote:

> First, let me apologize if my email came off as a personal "snipe"
> against the project or anyone working on it. I know the team has been
> hard at work for multiple years now on the project, and I certainly
> don't mean to denigrate the work anyone has done. 
> 
> But, I'll stand by my point that YARN is at this point more "alpha"
> than HDFS2.

I'll unfair to tag-team me while consistently ignoring what I write. 
(We are also in danger of hitting the threefold repetition rule: 
http://en.wikipedia.org/wiki/Threefold_repetition. *smile*)


Anyway, I'l repeat, here are the facts on the ground - the work we've done 
testing/stabilizing YARN/MRv2, it's stability, user-certification across 
thousands of unique apps, deployment etc. etc.: http://s.apache.org/QVX

> You brought up two bugs in the HDFS2 code base as examples
> of HDFS 2 not being high quality.

Through a lot of words you just agreed with what I said - if people didn't 
upgrade to HDFS2 (not just HA) they wouldn't hit any of these: HDFS-3626, 
HDFS-3731 etc. There are more, for e.g. how do folks work around Secondary NN 
not starting up on upgrades from hadoop-1 (HDFS-3597)? They just copy multiple 
PBs over to a new hadoop-2 cluster, or patch SNN themselves post HDFS-1073?

Anyway, I agree, we should talk about this in context of an actual release - 
hadoop-2.1.0 should mark YARN as *beta* IMO - particularly since it will be 
deployed at scale.

Arun




Re: Large feature development

2012-09-03 Thread Arun C Murthy

On Sep 3, 2012, at 12:05 AM, Arun C Murthy wrote:

> Todd,
> 
> I'll unfair to tag-team me while consistently ignoring what I write. 

Ugh, late Sunday night school-boy error - should have read:

I'll point out it's unfair [...]

Arun


Re: Large feature development

2012-09-03 Thread Todd Lipcon
On Mon, Sep 3, 2012 at 12:05 AM, Arun C Murthy  wrote:
>>
>> But, I'll stand by my point that YARN is at this point more "alpha"
>> than HDFS2.
>
> I'll unfair to tag-team me while consistently ignoring what I write.

I'm not sure I ignored what you wrote. I understand that Yahoo is
deploying soon on one of their clusters. That's great news. My
original point was about the state of YARN when it was merged, and the
comment about its current state was more of an aside. Hardly worth
debating further. Best of luck with the deployment next week - I look
forward to reading about how it goes on the list.

>> You brought up two bugs in the HDFS2 code base as examples
>> of HDFS 2 not being high quality.
>
> Through a lot of words you just agreed with what I said - if people didn't 
> upgrade to HDFS2 (not just HA) they wouldn't hit any of these: HDFS-3626,

You could hit this on Hadoop 1, it was just harder to hit.

> HDFS-3731 etc.

The details of this bug have to do with the upgrade/snapshot behavior
of the blocksBeingWritten directory which was added in branch-1. In
fact, the same basic bug continues to exist in branch-1. If you
perform an upgrade, it doesn't hard-link the blocks into the new
"current" directory. Hence, if the upgraded cluster exits safe mode
(causing lease recovery of those blocks), and then the user issues a
rollback, the blocks will have been deleted from the pre-upgrade
image. This broken branch-1 behavior carried over into branch-2 as
well, but it's not a new bug, as I said before.

> There are more, for e.g. how do folks work around Secondary NN not starting 
> up on upgrades from hadoop-1 (HDFS-3597)? They just copy multiple PBs over to 
> a new hadoop-2 cluster, or patch SNN themselves post HDFS-1073?

No, they rm -Rf the contents of the 2NN directory, which is completely
safe and doesn't data loss in any way. In fact, the bug fix is exactly
that -- it just does the rm -Rf itself, automatically. It's a trivial
workaround similar to how other bugs in the Hadoop 1 branch have
required workarounds in the past. Certainly no data movement or local
patching. The SNN is transient state and can always be cleared.

If you have any questions about other bugs in the 2.x line, feel free
to ask on the relevant JIRAs. I'm still perfectly confident in the
stability of HDFS 2 vs HDFS 1. In fact my cell phone is likely the one
that would ring if any of these production HDFS 2 clusters had an
issue, and I'll offer the same publicly to anyone on this list. If you
experience a corruption or data loss issue on the tip of branch-2
HDFS, email me off-list and I'll personally diagnose the issue. I
would not make that same offer for branch-1 due to the fundamentally
less robust design which has caused a lot of subtle bugs over the past
several years.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Large feature development

2012-09-03 Thread Arun C Murthy

On Sep 3, 2012, at 12:31 AM, Todd Lipcon wrote:

> On Mon, Sep 3, 2012 at 12:05 AM, Arun C Murthy  wrote:
>>> 
>>> But, I'll stand by my point that YARN is at this point more "alpha"
>>> than HDFS2.
>> 
>> I'll unfair to tag-team me while consistently ignoring what I write.
> 
> I'm not sure I ignored what you wrote. I understand that Yahoo is
> deploying soon on one of their clusters. That's great news. My
> original point was about the state of YARN when it was merged, and the
> comment about its current state was more of an aside. Hardly worth
> debating further. Best of luck with the deployment next week - I look
> forward to reading about how it goes on the list.

Everyone +1'ed the merge, now we'd like to rewrite history?
Also, it's current state is much that what you trivialized as 'deployed to one 
cluster' - again, please read my email on the effort we've undertaken to get 
where we are. That's a lot of work by many tens of people - hardly good form to 
trivialize them as you did.

Arun

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-09-03 Thread Arun C Murthy
Andrew,

On Sep 1, 2012, at 6:32 AM, Andrew Purtell wrote:

> I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on
> performance improvements, maybe such things as alternate shuffle plugins.
> Perhaps a HA JobTracker for parity with HDFS. 

Lots of this has already happened in branch-1, please look at:
# JT Availability: MAPREDUCE-3837, MAPREDUCE-4328, MAPREDUCE-4603 (WIP)
# Performance - backports of PureJavaCrc32 in spills (MAPREDUCE-782), fadvise 
backports (MAPREDUCE-3289) and other several misc. fixes.

thanks,
Arun 




Re: Large feature development (YARN vs HDFS)

2012-09-03 Thread Eric Baldeschwieler

Referring back to Chris M.s thread, this YARN vs HDFS discussion sounds a lot 
like an umbrella project issue to me.

On Sep 2, 2012, at 3:11 PM, Arun Murthy wrote:

> Eli,
> 
> On Sep 2, 2012, at 1:01 PM, Eli Collins  wrote:
> 
>> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
>>> Todd,
>>> 
>>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>>> 
 I'd actually contend that YARN was merged too early. I have yet to see
 anyone running YARN in production, and it's holding up the "Stable"
 moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
 I'm seeing fewer issues in our customers running Hadoop HDFS 2
 compared to Hadoop 1-derived code.
>>> 
>>> You know I respect you a ton, but I'm very saddened to see you perpetuate 
>>> this FUD on our public lists. I expected better, particularly when everyone 
>>> is working towards the same goals of advancing Hadoop-2. This sniping on 
>>> other members doing work is, um, I'll just stop here rather than regret 
>>> later.
>> 2. HDFS is more mature than YARN. Not a surprise given that we all
>> agree YARN is alpha, and a much newer project than HDFS that hasn't
>> yet been deployed in production environments yet (to my knowledge).
> 
> Let's focus on the ground reality here.
> 
> Please read my (or Rajiv's) message again about YARN's current
> stability and how much it's baked, it's deployment plans to a very
> large cluster in a few *days*. Or, talk to the people developing,
> testing and supporting these customers and clusters.
> 
> I'll repeat - YARN has clearly baked much more than HDFS HA given
> the basic bugs (upgrade, edit logs corruption etc.) we've seen after
> being declared *done*; but then we just disagree since clearly I'm
> more conservative. Also, we need to be more conservative wrt HDFS -
> but then what would I know...
> 
> I'll admit it's hard to discuss with someone (or a collective) who
> just repeat themselves. Plus, I broke my own rule about email this
> weekend - so, I'll try harder.
> 
> Arun



Re: Large feature development (YARN vs HDFS)

2012-09-03 Thread Arun C Murthy
Agreed... it does seem like a case of 'my wife is prettier'.

Maybe I'm oversensitive and it may even be understandable given how much of my 
waking time I've devoted to YARN over the last 30 months; but I do apologize 
for indulging in the behavior I accused others of. A good night's sleep does 
help in clearing mists. IAC, the point I was trying to quantify is simple - 
current state of YARN is far better than was being characterized here.

We should get back to discussing 'large-feature development' - thanks for 
starting that discussion Steve.

Arun

On Sep 3, 2012, at 2:30 PM, Eric Baldeschwieler wrote:

> 
> Referring back to Chris M.s thread, this YARN vs HDFS discussion sounds a lot 
> like an umbrella project issue to me.
> 
> On Sep 2, 2012, at 3:11 PM, Arun Murthy wrote:
> 
>> Eli,
>> 
>> On Sep 2, 2012, at 1:01 PM, Eli Collins  wrote:
>> 
>>> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
 Todd,
 
 On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
 
> I'd actually contend that YARN was merged too early. I have yet to see
> anyone running YARN in production, and it's holding up the "Stable"
> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
> I'm seeing fewer issues in our customers running Hadoop HDFS 2
> compared to Hadoop 1-derived code.
 
 You know I respect you a ton, but I'm very saddened to see you perpetuate 
 this FUD on our public lists. I expected better, particularly when 
 everyone is working towards the same goals of advancing Hadoop-2. This 
 sniping on other members doing work is, um, I'll just stop here rather 
 than regret later.
>>> 2. HDFS is more mature than YARN. Not a surprise given that we all
>>> agree YARN is alpha, and a much newer project than HDFS that hasn't
>>> yet been deployed in production environments yet (to my knowledge).
>> 
>> Let's focus on the ground reality here.
>> 
>> Please read my (or Rajiv's) message again about YARN's current
>> stability and how much it's baked, it's deployment plans to a very
>> large cluster in a few *days*. Or, talk to the people developing,
>> testing and supporting these customers and clusters.
>> 
>> I'll repeat - YARN has clearly baked much more than HDFS HA given
>> the basic bugs (upgrade, edit logs corruption etc.) we've seen after
>> being declared *done*; but then we just disagree since clearly I'm
>> more conservative. Also, we need to be more conservative wrt HDFS -
>> but then what would I know...
>> 
>> I'll admit it's hard to discuss with someone (or a collective) who
>> just repeat themselves. Plus, I broke my own rule about email this
>> weekend - so, I'll try harder.
>> 
>> Arun
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/