________________________________
>From: Zhanwei Wang [had...@wangzw.org]
>Sent: Wednesday, November 30, 2011 4:34 PM
>To: hdfs-user@hadoop.apache.org
>Subject: Re: Generation Stamp

>Hi, everyone

>Following the discussing, I would like to know if the DataNode report a 
>overage block to Namenode, according to >Uma, NameNode can reject it, what the 
>DataNode will do then?
  NN will add then into invalidates list and inform to DN through heartbeats 
responses. Next action in Datanode will be to delete that block physically.
>Ask other datanode copy a new replica to it and delete the old one? Or 
>NameNode will arrange the work if the >number of the replicas is below the 
>specified value? Where can I find this code?
When NN replication moniter finds this block in neededReplications lists, it 
will choose one SRC node ( who has the good replica) and ask to replicate on 
other datanode to meet the replication.

Hope it helps you....

>Thanks
>Zhanwei Wang


发件人: hdfs-user-return-1831-hadoop=wangzw....@hadoop.apache.org 
[mailto:hdfs-user-return-1831-hadoop=wangzw....@hadoop.apache.org] 代表 kartheek 
muthyala
发送时间: 2011年11月30日 12:07
收件人: hdfs-user@hadoop.apache.org
主题: Re: Generation Stamp

Thanks Uma..:)
On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G 
<mahesw...@huawei.com<mailto:mahesw...@huawei.com>> wrote:
Yes. :-)
________________________________
From: kartheek muthyala [kartheek0...@gmail.com<mailto:kartheek0...@gmail.com>]
Sent: Tuesday, November 29, 2011 10:20 PM
To: hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>
Subject: Re: Generation Stamp
Uma, first of all thanks for the detailed exemplified explanation.

So to confirm, the primary use of having this generationTimeStamp is to ensure 
consistency of the block?. So, when the pipeline is failed at DN3, and the 
client invokes recovery, then the NN will chose DN1 to complete the pipeline. 
The DN1 first updates its metafile with the new time stamp, and then passes 
this information to the other replica at DN2. Further, in the future NN sees 
that this particular block is under replicated and it assigns some other DNa 
and asks either DN1/DN2 to replicate the same at DNa.


Thanks,
Kartheek.

On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G 
<mahesw...@huawei.com<mailto:mahesw...@huawei.com>> wrote:

Generationstamp is basically to keep track of the replica states.

 Consider one scenario where generation smap will be use:

  Create a file which has one block. client started writing that block to DN1, 
DN 2, DN3 ( pipeline )

After writing some data DN3 failed, then Client will get the exception about 
pipeline failuere. Then Client will handle that exception ( you can see it in 
processDataNodeError in DataStreamer thread) . It will remove DN3 and will call 
the recovery for that block with new generation time stamp, then NN will choose 
one primary DN and assign block synchronization work.Then primary DN will 
ensure that all the remainnng block lengths are same ( if require it will 
truncate to consistant length) and will invoke committblckSynchronization. Then 
remaing datatransfer will resume.



 now block will have new genartion timestamp. You can observe this in metadata 
file for that block in DN.



now the block will be like blk_12345634444<tel:12345634444>, 
blk_12345634444<tel:12345634444>_1234.meta

here 1234 is the generation timestamp.

Assume a case, after resuming the write again, DN2 fails, then again recovery 
will starts and will get new Generation time stamp again. now only DN1 in 
pipeline  and block is blk_12345634444<tel:12345634444>, 
blk_12345634444<tel:12345634444>_1235.meta. resume the the remaing data writes 
and complted the last packet. With the last packet blocks should be finalized. 
DN1 is finalized the block successfully and DN1 will send blocks received 
command and block info will be updated in blocks map . Assume if DN2 comes back 
and sending that old block in reports to NN. Here NN can find that generation 
timestamp of that block is lesser than DN1 reported blocks genstamp. So, it can 
take the decision now. it can reject the lesser generation time stamp block.



Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will be 
many conditions like length mismatch..etc



Hope it will help you....



Regards,

Uma





________________________________
From: kartheek muthyala [kartheek0...@gmail.com<mailto:kartheek0...@gmail.com>]
Sent: Tuesday, November 29, 2011 7:44 PM
To: hdfs-user
Subject: Generation Stamp
Hi all,
Why is there the concept of Generation Stamp that is getting tagged to the 
metadata of the block.? How is it useful? I have seen that in the hdfs current 
directory, the metafiles are tagged with this generation stamp. Does this keep 
track of the versioning?
~Kartheek.

Regards,
Uma

Reply via email to