Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

Andrew Purtell Thu, 27 Sep 2012 02:29:05 -0700

It's true we can package and use HDFS and QJM, if it were separately
distributed, because we supply our internal users with an Apache Hadoop
distribution with our additional polish. Also, Cloudera or Hortonworks or
others could package them together, but then what of the Apache Hadoop
distribution itself. Apache Hadoop would be less user friendly than it has
to be. And the larger argument here, I think, is finally rectifying that
SPOF without introducing another? But that's just one users opinion...


On Thursday, September 27, 2012, Konstantin Shvachko wrote:

> On Thu, Sep 27, 2012 at 1:50 AM, Andrew Purtell 
> <apurt...@apache.org<javascript:;>>
> wrote:
> > Speaking as an Apache Hadoop user who must do something with the NameNode
> > single point of failure this year, I don't subscribe to the view that
> > moving that SPOF from the NameNode to a NFS filer is reasonable to ask of
> > those not already set up with NetApp or similar, or those running in a
> > "cloud" environment, or those quite common deployments (sadly) where our
> > legacy datacenter designs are not... ideal. I would be curious how common
> > this opinion is (or not).
>
> Well I was arguing that same thing about NFS from the beginning of
> this HA design.
> I am glad to hear it now, even though its too late.
>
> > So we intend to run HDFS 2 in HA configuration using the QJM for edit log
> > persistence, fencing, and recovery.
>
> Whether it is in the core HDFS or not, right? Cos its a matter of
> packaging.
>
> > Also, there is a BookKeeper based journal manager under development
> already
> > in HDFS in trunk and on branch-2. Occasionally I've broken it patching up
> > HDFS. I suppose that should come out too? But I would think that not a
> good
> > idea either per the above reasoning.
>
> Competing technologies should exist outside the project.
> Let different distributions compete outside the core.
>
> --Konst
>
> > On Thursday, September 27, 2012, Konstantin Shvachko wrote:
> >
> >> Hi Todd,
> >>
> >> > I had said previously that it's worth
> >> > discussing if several other people believe the same.
> >>
> >> Well let's put it on to general list for discussion then?
> >> Seems to me an important issue for  Hadoop evolution in general.
> >> We keep growing the HDFS umbrella with competing technologies
> >> (http/web HDFS as an example) within it.
> >> Which makes the project harder to stabilize and release.
> >> Not touching MR/Yarn here.
> >>
> >> > If at some point in the future, the internal APIs have fully
> >> > stabilized (security, IPC, edit log streams, JournalManager, metrics,
> >> > etc) then we can pull it out at that time.
> >>
> >> By that time it will monolithically grow into HDFS and vise versa.
> >>
> >> > I know that we plan to ship it as part of CDH and will be our
> >> > recommended way of running HA HDFS.
> >>
> >> Sounds like CDH is moving well in release plans and otherwise.
> >> My concern is that if we add another 6000 lines of code to Hadoop-2,
> >> it will take yet another x months for stabilization.
> >> While it is not clear why people cannot just use NFS filers for shared
> >> storage,
> >> as you originally designed.
> >>
> >> > distros. Moving it to an entirely separate standalone project will
> >> > just add extra work for these folks who, like us, think it's currently
> >> > the best option for HA log storage.
> >>
> >> Don't know who these folks are. I see it as less work for HDFS
> community,
> >> because there is no need for porting and supporting this project in two
> or
> >> more different versions.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >> On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon 
> >> <t...@cloudera.com<javascript:;>
> <javascript:;>>
> >> wrote:
> >> > On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko
> >> > <shv.had...@gmail.com> wrote:
> >> >> I think this is a great work, Todd.
> >> >> And I think we should not merge it into trunk or other branches.
> >> >> As I suggested earlier on this list I think this should be spinned
> off
> >> >> as a separate project or a subproject.
> >> >>
> >> >> - The code is well detached as a self contained package.
> >> >
> >> > The addition is mostly self-contained, but it makes use of a bunch of
> >> > "private" parts of HDFS and Common:
> >> > - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc
> >> > - Coupled to the JournalManager interface which is still evolving. In
> >> > fact there were several patches in trunk which were done during the
> >> > development of this project, specifically to make this API more
> >> > general. There's still some further work to be done in this area on
> >> > the generic interface -- eg support for upgrade/rollback.
> >> > - The functional tests make use of a bunch of "private" HDFS APIs as
> >> well.
> >> >
> >> >> - It is a logically stand-alone project that can be replaced by other
> >> >> technologies.
> >> >> - If it is a separate project then there is no need to port it to
> >> >> other versions. You can package it as a dependent jar.
> >> >
> >> > Per above, it's not that separate, because in order to build it, we
> >> > had to make a number of changes to core HDFS internal interfaces. It
> >> > currently couldn't be used to store anything except for NN logs. It
> >> > would be a nice extension to truly separate it out into a
> >> > content-agnostic quorum-based edit log, but today it actually uses the
> >> > existing edit log validation code to determine valid lengths, etc.
> >> >
> >> >> - Finally, it will be a good precedent of spinning new projects out
> of
> >> >> HDFS rather than bringing everything under HDFS umbrella.
> >> >>
> >> >> Todd, I had a feeling you were in favor of this direction?
> >> >
> >> > I'm not in favor of it - I had said previously that it's worth
> >> > discussing if several other people believe the same.
> >> >
> >> > I know that we plan to ship it as part of CDH and will be our
> >> > recommended way of running HA HDFS. If the community doesn't accept
> >> > the contribution, and prefers that we maintain it in a fork on github,
> >> > then it's worth hearing. But I imagine that many other community
> >> > members will want to either use or it ship it as part of their
> >> > distros. Moving it to an entirely separate standalone project will
> >> > just add extra work for these folks who, like us, think it's currently
> >> > the best option for HA log storage.
> >> >
> >> > If at some point in the future, the internal APIs have fully
> >> > stabilized (security, IPC, edit log streams, JournalManager, metrics,
> >> > etc) then we can pull it out at that time.
> >> >
> >> > -Todd
> >> >
> >> >> On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins <e...@cloudera.com>
> wrote:
> >> >>> +1   Awesome work Todd.
> >> >>>
> >> >>> On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon <t...@cloudera.com>
> >> wrote:
> >> >>>> Dear fellow HDFS developers,
> >> >>>>
> >> >>>> Per my email thread last week ("Heads up: merge for QJM branch
> soon"
> >> >>>> at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to
> >> >>>> propose merging the HDFS-3077 branch into trunk. The branch has
> been
> >> >>>> active since mid July and has stabilized significantly over the
> last
> >> >>>> two months. It has passed the full test suite, findbugs, and
> release
> >> >>>> audit, and I think it's ready to merge at this point.
> >> >>>>
> >> >>>> The branch has been fully deve



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

Reply via email to