Re: Large feature development

Todd Lipcon Sun, 02 Sep 2012 18:13:11 -0700

Hey Arun,

First, let me apologize if my email came off as a personal "snipe"
against the project or anyone working on it. I know the team has been
hard at work for multiple years now on the project, and I certainly
don't mean to denigrate the work anyone has done. I also agree that
the improvements made possible by YARN are tremendously important, and
I've expressed this opinion both online and in interviews with
analysts, etc.

But, I'll stand by my point that YARN is at this point more "alpha"
than HDFS2. You brought up two bugs in the HDFS2 code base as examples
of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a
messy bug, but had nothing to do with HA, the edit log rewrite, or any
other of the changes being discussed in the thread. In fact, the bug
has been there since the "beginning of time", and is in fact present
in Hadoop 1.0.x as well (which is why the JIRA is still open). You
simply need to pass a non-canonicalized path by the Path(URI)
constructor, and you'll see the same behavior in every release
including 1.0.x, 0.20.x, or earlier. The reason it shows up more often
in Hadoop 2 was actually due to the FsShell rewrite -- not any changes
in HDFS itself, and certainly not related to HA like you've implied
here.

The other bug causes blocksBeingWritten to disappear upon upgrade.
This, also, had nothing to do with any of the features being discussed
in this thread, and in fact only impacts a cluster which is taken down
_uncleanly_ prior to an upgrade. Upon starting the upgraded cluster,
the user would be alerted to the missing blocks and could rollback
with no lost data. So, while it should be fixed (and has been), I
wouldn't consider it particularly frightening. Most users I am aware
of do a "clean" shutdown of services like HBase before trying to
upgrade their cluster, and, worst case, they would see the issue
immediately after the upgrade and perform a rollback with no adverse
effects.

In branch-1, however, I've seen other bugs that I'd consider much more
scary. Two in particular come to mind and together represent the vast
majority of cases in which we've seen customers experience data
corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1
only, and never present in Hadoop 2 due to the "edit log rewrite"
project (HDFS-1073).

So, at risk of this thread just becoming a laundry list of bugs that
have existed in HDFS, or a list of bugs in YARN, I'll summarize: I
still think that YARN is "alpha" and HDFS 2 is at least as "stable" as
Hadoop 1.0. We have customers running it for production workloads, in
multi-rack clusters, with great success. But this has nothing to do
with this thread at hand, so I'll raise the question of
alpha/beta/stable labeling in the context of our next release vote,
and hope we can go back to the more fruitful discussion of how to
encourage large feature development while maintaining stability.

Thanks
-Todd

On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy <a...@hortonworks.com> wrote:
> Eli,
>
> On Sep 2, 2012, at 1:01 PM, Eli Collins <e...@cloudera.com> wrote:
>
>> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>>> Todd,
>>>
>>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>>>
>>>> I'd actually contend that YARN was merged too early. I have yet to see
>>>> anyone running YARN in production, and it's holding up the "Stable"
>>>> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
>>>> I'm seeing fewer issues in our customers running Hadoop HDFS 2
>>>> compared to Hadoop 1-derived code.
>>>
>>> You know I respect you a ton, but I'm very saddened to see you perpetuate 
>>> this FUD on our public lists. I expected better, particularly when everyone 
>>> is working towards the same goals of advancing Hadoop-2. This sniping on 
>>> other members doing work is, um, I'll just stop here rather than regret 
>>> later.
>> 2. HDFS is more mature than YARN. Not a surprise given that we all
>> agree YARN is alpha, and a much newer project than HDFS that hasn't
>> yet been deployed in production environments yet (to my knowledge).
>
> Let's focus on the ground reality here.
>
> Please read my (or Rajiv's) message again about YARN's current
> stability and how much it's baked, it's deployment plans to a very
> large cluster in a few *days*. Or, talk to the people developing,
> testing and supporting these customers and clusters.
>
> I'll repeat - YARN has clearly baked much more than HDFS HA given
> the basic bugs (upgrade, edit logs corruption etc.) we've seen after
> being declared *done*; but then we just disagree since clearly I'm
> more conservative. Also, we need to be more conservative wrt HDFS -
> but then what would I know...
>
> I'll admit it's hard to discuss with someone (or a collective) who
> just repeat themselves. Plus, I broke my own rule about email this
> weekend - so, I'll try harder.
>
> Arun

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Large feature development

Reply via email to