Thanks. It seem that it isn't a right way, but i learn a lot from you.
2008/9/12 Pete Wyckoff <[EMAIL PROTECTED]> > > You may want to look at hadoop's proposal for snapshotting, where one can > take a snapshot's metadata and store it in some disaster resilient place(s) > for a rainy day: > > https://issues.apache.org/jira/browse/HADOOP-3637 > > > > > > On 9/11/08 10:06 AM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote: > > > My opinion is to not store file-namespace related metadata on the > > datanodes. When a file is renamed, one has to contact all datanodes to > > change this new metadata. Worse still, if one renames an entire > > subdirectory, all blocks that belongs to all files in the subdirectory > > have to be updated. Similarly, if in future, a file has multiple > > patches to it (links), a block may belong to two filenames. > > > > In the future, if HDFS wants to implement any kind of de-duplication > > (i.e. if the same block data appears in multiple files, the file > > system can intelligently keep only one copy of the block).. it will be > > difficult to do. > > > > thanks, > > dhruba > > > > > > > > On Wed, Sep 10, 2008 at 7:40 PM, 叶双明 <[EMAIL PROTECTED]> wrote: > >> Thanks Ari Rabkin! > >> > >> 1. I think the cost is very low, if the block's size is 10m, 1k/10m > almost > >> 0.01% of the disk space. > >> > >> 2. Actually, if two of racks lose and replication <= 3, it seem that we > >> can't recover all data. But in the situation of losing one rack of two > racks > >> and replication >=2, we can recover all data. > >> > >> 3. Suppose we recover 87.5% of data. I am not sure whether or not the > random > >> 87.5% of the data is usefull for every user. But in the situation of the > >> size of most file is less than block'size, we can recover so much > data,.Any > >> recovered data may be valuable for some user. > >> > >> 4. I guess most small companies or organizations just have a cluster > with > >> 10-100 nodes, and they can not afford a second HDFS cluster in a > different > >> place or SAN. And it is a simple way to I think they would be pleased to > >> ensure data safety for they. > >> > >> 5. We can config to turn on when someone need it, or turn it off > otherwise. > >> > >> Glad to discuss with you! > >> > >> > >> 2008/9/11 Ariel Rabkin <[EMAIL PROTECTED]> > >> > >>> I don't understand this use case. > >>> > >>> Suppose that you lose half the nodes in the cluster. On average, > >>> 12.5% of your blocks were exclusively stored on the half the cluster > >>> that's dead. For many (most?) applications, a random 87.5% of the > >>> data isn't really useful. Storing metadata in more places would let > >>> you turn a dead cluster into a corrupt cluster, but not into a working > >>> one. If you need to survive major disasters, you want a second HDFS > >>> cluster in a different place. > >>> > >>> The thing that might be useful to you, if you're worried about > >>> simultaneous namenode and secondary NN failure, is to store the edit > >>> log and fsimage on a SAN, and get fault tolerance that way. > >>> > >>> --Ari > >>> > >>> On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote: > >>>> Thanks for paying attention to my tentative idea! > >>>> > >>>> What I thought isn't how to store the meradata, but the final (or > last) > >>> way > >>>> to recover valuable data in the cluster when something worst (which > >>> destroy > >>>> the metadata in all multiple NameNode) happen. i.e. terrorist attack > or > >>>> natural disasters destroy half of cluster nodes within all NameNode, > we > >>> can > >>>> recover as much data as possible by this mechanism, and hava big > chance > >>> to > >>>> recover entire data of cluster because fo original replication. > >>>> > >>>> Any suggestion is appreciate! > >>>> > >>>> 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]> > >>>> > >>>>> +1 - > >>>>> > >>>>> from the perspective of the data nodes, dfs is just a block-level > store > >>> and > >>>>> is thus much more robust and scalable. > >>>>> > >>>>> > >>>>> > >>>>> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > >>>>> > >>>>>> This isn't a very stable direction. You really don't want multiple > >>>>> distinct > >>>>>> methods for storing the metadata, because discrepancies are very > bad. > >>>>> High > >>>>>> Availability (HA) is a very important medium term goal for HDFS, but > >>> it > >>>>> will > >>>>>> likely be done using multiple NameNodes and ZooKeeper. > >>>>>> > >>>>>> -- Owen > >>>>> > >>> > >>> -- > >>> Ari Rabkin [EMAIL PROTECTED] > >>> UC Berkeley Computer Science Department > >>> > >> > >> > >> > >> -- > >> Sorry for my english!! 明 > >> Please help me to correct my english expression and error in syntax > >> > > -- Sorry for my english!! 明 Please help me to correct my english expression and error in syntax