Thanks a lot!!

On Wed, Oct 5, 2011 at 3:51 PM, Eric Fiala <e...@fiala.ca> wrote:

> Ronen,
> On file write HDFS's block replication pipeline is asynchronous - datanode
> 1
> gets a block before passing it onto datanode 2, and so on (limiting network
> traffic between client node and the data nodes - it only writes to one).
>
> The ACK for a packet is returned only once all datanodes in the pipeline
> have copied the block.
>
> However, if a failure occurs in the interim on a datanode in the write
> pipeline, AND the minimum replication threshold has been met (normally 1) -
> namenode will, in seperate operation, quell the replica deficit.
>
> Don't think that's configurable, however, it would be interesting use case
> for speeding up writes, while trading off some reliability.
>
> EF
>
> On Wed, Oct 5, 2011 at 1:53 AM, Ronen Itkin <ro...@taykey.com> wrote:
>
> > Hi all!
> >
> > My question is regarding hdfs block replication.
> > From the perspective of client, does the application receives an ACK for
> a
> > certain packet after it was written on the first
> > hadoop data node in the pipeline? or after the packet is *replicated* to
> > all
> > assigned *replication* nodes?
> >
> > More generaly, does Hadoop's HDFS block replication works synchronously
> or
> > asynchronously?
> >
> > synchronously --> more replications =  decrease in write performances
> > (client has to wait until every packet will be written to all replication
> > nodes before he receives an ACK).
> > asynchronously --> more replication has no influence on write performance
> > (client recieves an ACK packet after the first write to the first
> datadone
> > finishes, hdfs will complete its replication on his free time).
> >
> > synchronously / asynchronously block replication - is it something
> > configurable ? If it is, than how can I do it?
> >
> > Thanks!
> >
> > --
> > *
> > Ronen Itkin*
> > Taykey | www.taykey.com
> >
>



-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Reply via email to