Ronen,
On file write HDFS's block replication pipeline is asynchronous - datanode 1
gets a block before passing it onto datanode 2, and so on (limiting network
traffic between client node and the data nodes - it only writes to one).

The ACK for a packet is returned only once all datanodes in the pipeline
have copied the block.

However, if a failure occurs in the interim on a datanode in the write
pipeline, AND the minimum replication threshold has been met (normally 1) -
namenode will, in seperate operation, quell the replica deficit.

Don't think that's configurable, however, it would be interesting use case
for speeding up writes, while trading off some reliability.

EF

On Wed, Oct 5, 2011 at 1:53 AM, Ronen Itkin <ro...@taykey.com> wrote:

> Hi all!
>
> My question is regarding hdfs block replication.
> From the perspective of client, does the application receives an ACK for a
> certain packet after it was written on the first
> hadoop data node in the pipeline? or after the packet is *replicated* to
> all
> assigned *replication* nodes?
>
> More generaly, does Hadoop's HDFS block replication works synchronously or
> asynchronously?
>
> synchronously --> more replications =  decrease in write performances
> (client has to wait until every packet will be written to all replication
> nodes before he receives an ACK).
> asynchronously --> more replication has no influence on write performance
> (client recieves an ACK packet after the first write to the first datadone
> finishes, hdfs will complete its replication on his free time).
>
> synchronously / asynchronously block replication - is it something
> configurable ? If it is, than how can I do it?
>
> Thanks!
>
> --
> *
> Ronen Itkin*
> Taykey | www.taykey.com
>

Reply via email to