Re: Replication problem of HDFS

Ted Dunning Mon, 10 Sep 2007 23:35:55 -0700

Your question is very hard to understand.  The problem may be the names of
the different kinds of server.

There is one namenode and there are many datanodes.

Each file is divided into one or more blocks.  By default the block has a
maximum size of 64MB.  Each block from a file is stored on one or more
datanodes.  The number of datanodes holding each block is called replication
factor.  The namenode holds information about what blocks are in each file.
The namenode also contains information about what blocks each datanode
holds.

As an example, consider that you have 3 files called A, B, and C.  Each file
is 150MB so they have two full size blocks (A1, A2, B1, B2, C1, C2) and one
partial block that is 22MB in size (A3, B3, C3).

Suppose that replication factor is 1 for A, 2 for B and 3 for C.

One possible state of five datanodes is this:

Datanode1:
A1, B2, C3, C1

Datanode2:
A2, C2, B2

Datanode3:
A3, C1, C3, B1

Datanode4:
B1, C1, C2, B3

Datanode5:
B3, C2, C3

The namenode would contain this information:

A -> (A1, A2, A3)
B -> (B1, B2, B3)
C -> (C1, C2, C3)

A1 -> (Datanode1)
B1 -> (Datanode3, Datanode4)
C1 -> (Datanode1, Datanode3, Datanode4)
  ... And so on ...

Does that help?

On 9/10/07 8:04 PM, "ChaoChun Liang" <[EMAIL PROTECTED]> wrote:

> 
> In my application, whether M blocks(described as above) exist in the name
> datanode(i.e. each database
> owns a completed M block), or shared M blocks for datanodes in the HDFS is
> important for us.
> 
> If these M blocks could be shared, we may use the HDFS, otherswise we may
> condiser the local file system
> for the map/reduce processing.

Re: Replication problem of HDFS

Reply via email to