Re: Thinking about retriving DFS metadata from datanodes!!!

叶双明 Thu, 11 Sep 2008 17:38:19 -0700

Thanks.

It seem that it isn't a right way, but i learn a lot from you.


2008/9/12 Pete Wyckoff <[EMAIL PROTECTED]>

>
> You may want to look at hadoop's proposal for snapshotting, where one can
> take a snapshot's metadata and store it in some disaster resilient place(s)
> for a rainy day:
>
> https://issues.apache.org/jira/browse/HADOOP-3637
>
>
>
>
>
> On 9/11/08 10:06 AM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote:
>
> > My opinion is to not store file-namespace related metadata on the
> > datanodes. When a file is renamed, one has to contact all datanodes to
> > change this new metadata. Worse still, if one renames an entire
> > subdirectory, all blocks that belongs to all files in the subdirectory
> > have to be updated. Similarly, if in future,  a file has multiple
> > patches to it (links), a block may belong to two filenames.
> >
> > In the future, if HDFS wants to implement any kind of de-duplication
> > (i.e. if the same block data appears in multiple files, the file
> > system can intelligently keep only one copy of the block).. it will be
> > difficult to do.
> >
> > thanks,
> > dhruba
> >
> >
> >
> > On Wed, Sep 10, 2008 at 7:40 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
> >> Thanks Ari Rabkin!
> >>
> >> 1. I think the cost is very low, if the block's size is 10m, 1k/10m
> almost
> >> 0.01% of the disk space.
> >>
> >> 2. Actually, if two of racks lose and replication <= 3, it seem that we
> >> can't recover all data. But in the situation of losing one rack of two
> racks
> >> and replication >=2, we can recover all data.
> >>
> >> 3. Suppose we recover 87.5% of data. I am not sure whether or not the
> random
> >> 87.5% of the data is usefull for every user. But in the situation of the
> >> size of most file is less than block'size, we can recover  so much
> data,.Any
> >> recovered data may be  valuable for some user.
> >>
> >> 4. I guess most small companies or organizations just have a cluster
> with
> >> 10-100 nodes, and they can not afford a second HDFS cluster in a
> different
> >> place or SAN. And it is a simple way to I think they would be pleased to
> >> ensure data safety for they.
> >>
> >> 5. We can config to turn on when someone need it, or turn it off
> otherwise.
> >>
> >> Glad to discuss with you！
> >>
> >>
> >> 2008/9/11 Ariel Rabkin <[EMAIL PROTECTED]>
> >>
> >>> I don't understand this use case.
> >>>
> >>> Suppose that you lose half the nodes in the cluster.  On average,
> >>> 12.5% of your blocks were exclusively stored on the half the cluster
> >>> that's dead.  For many (most?) applications, a random 87.5% of the
> >>> data isn't really useful.  Storing metadata in more places would let
> >>> you turn a dead cluster into a corrupt cluster, but not into a working
> >>> one.   If you need to survive major disasters, you want a second HDFS
> >>> cluster in a different place.
> >>>
> >>> The thing that might be useful to you, if you're worried about
> >>> simultaneous namenode and secondary NN failure, is to store the edit
> >>> log and fsimage on a SAN, and get fault tolerance that way.
> >>>
> >>> --Ari
> >>>
> >>> On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
> >>>> Thanks for paying attention  to my tentative idea!
> >>>>
> >>>> What I thought isn't how to store the meradata, but the final (or
> last)
> >>> way
> >>>> to recover valuable data in the cluster when something worst (which
> >>> destroy
> >>>> the metadata in all multiple NameNode) happen. i.e. terrorist attack
>  or
> >>>> natural disasters destroy half of cluster nodes within all NameNode,
> we
> >>> can
> >>>> recover as much data as possible by this mechanism, and hava big
> chance
> >>> to
> >>>> recover entire data of cluster because fo original replication.
> >>>>
> >>>> Any suggestion is appreciate!
> >>>>
> >>>> 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]>
> >>>>
> >>>>> +1 -
> >>>>>
> >>>>> from the perspective of the data nodes, dfs is just a block-level
> store
> >>> and
> >>>>> is thus much more robust and scalable.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>> This isn't a very stable direction. You really don't want multiple
> >>>>> distinct
> >>>>>> methods for storing the metadata, because discrepancies are very
> bad.
> >>>>> High
> >>>>>> Availability (HA) is a very important medium term goal for HDFS, but
> >>> it
> >>>>> will
> >>>>>> likely be done using multiple NameNodes and ZooKeeper.
> >>>>>>
> >>>>>> -- Owen
> >>>>>
> >>>
> >>> --
> >>> Ari Rabkin [EMAIL PROTECTED]
> >>> UC Berkeley Computer Science Department
> >>>
> >>
> >>
> >>
> >> --
> >> Sorry for my english!!  明
> >> Please help me to correct my english expression and error in syntax
> >>
>
>


-- 
Sorry for my english!! 明
Please help me to correct my english expression and error in syntax

Re: Thinking about retriving DFS metadata from datanodes!!!

Reply via email to