My opinion is to not store file-namespace related metadata on the
datanodes. When a file is renamed, one has to contact all datanodes to
change this new metadata. Worse still, if one renames an entire
subdirectory, all blocks that belongs to all files in the subdirectory
have to be updated. Similarly, if in future,  a file has multiple
patches to it (links), a block may belong to two filenames.

In the future, if HDFS wants to implement any kind of de-duplication
(i.e. if the same block data appears in multiple files, the file
system can intelligently keep only one copy of the block).. it will be
difficult to do.

thanks,
dhruba



On Wed, Sep 10, 2008 at 7:40 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
> Thanks Ari Rabkin!
>
> 1. I think the cost is very low, if the block's size is 10m, 1k/10m almost
> 0.01% of the disk space.
>
> 2. Actually, if two of racks lose and replication <= 3, it seem that we
> can't recover all data. But in the situation of losing one rack of two racks
> and replication >=2, we can recover all data.
>
> 3. Suppose we recover 87.5% of data. I am not sure whether or not the random
> 87.5% of the data is usefull for every user. But in the situation of the
> size of most file is less than block'size, we can recover  so much data,.Any
> recovered data may be  valuable for some user.
>
> 4. I guess most small companies or organizations just have a cluster with
> 10-100 nodes, and they can not afford a second HDFS cluster in a different
> place or SAN. And it is a simple way to I think they would be pleased to
> ensure data safety for they.
>
> 5. We can config to turn on when someone need it, or turn it off otherwise.
>
> Glad to discuss with you!
>
>
> 2008/9/11 Ariel Rabkin <[EMAIL PROTECTED]>
>
>> I don't understand this use case.
>>
>> Suppose that you lose half the nodes in the cluster.  On average,
>> 12.5% of your blocks were exclusively stored on the half the cluster
>> that's dead.  For many (most?) applications, a random 87.5% of the
>> data isn't really useful.  Storing metadata in more places would let
>> you turn a dead cluster into a corrupt cluster, but not into a working
>> one.   If you need to survive major disasters, you want a second HDFS
>> cluster in a different place.
>>
>> The thing that might be useful to you, if you're worried about
>> simultaneous namenode and secondary NN failure, is to store the edit
>> log and fsimage on a SAN, and get fault tolerance that way.
>>
>> --Ari
>>
>> On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
>> > Thanks for paying attention  to my tentative idea!
>> >
>> > What I thought isn't how to store the meradata, but the final (or last)
>> way
>> > to recover valuable data in the cluster when something worst (which
>> destroy
>> > the metadata in all multiple NameNode) happen. i.e. terrorist attack  or
>> > natural disasters destroy half of cluster nodes within all NameNode, we
>> can
>> > recover as much data as possible by this mechanism, and hava big chance
>> to
>> > recover entire data of cluster because fo original replication.
>> >
>> > Any suggestion is appreciate!
>> >
>> > 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]>
>> >
>> >> +1 -
>> >>
>> >> from the perspective of the data nodes, dfs is just a block-level store
>> and
>> >> is thus much more robust and scalable.
>> >>
>> >>
>> >>
>> >> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > This isn't a very stable direction. You really don't want multiple
>> >> distinct
>> >> > methods for storing the metadata, because discrepancies are very bad.
>> >> High
>> >> > Availability (HA) is a very important medium term goal for HDFS, but
>> it
>> >> will
>> >> > likely be done using multiple NameNodes and ZooKeeper.
>> >> >
>> >> > -- Owen
>> >>
>>
>> --
>> Ari Rabkin [EMAIL PROTECTED]
>> UC Berkeley Computer Science Department
>>
>
>
>
> --
> Sorry for my english!!  明
> Please help me to correct my english expression and error in syntax
>

Reply via email to