Generationstamp is basically to keep track of the replica states.

 Consider one scenario where generation smap will be use:

  Create a file which has one block. client started writing that block to DN1, 
DN 2, DN3 ( pipeline )

After writing some data DN3 failed, then Client will get the exception about 
pipeline failuere. Then Client will handle that exception ( you can see it in 
processDataNodeError in DataStreamer thread) . It will remove DN3 and will call 
the recovery for that block with new generation time stamp, then NN will choose 
one primary DN and assign block synchronization work.Then primary DN will 
ensure that all the remainnng block lengths are same ( if require it will 
truncate to consistant length) and will invoke committblckSynchronization. Then 
remaing datatransfer will resume.



 now block will have new genartion timestamp. You can observe this in metadata 
file for that block in DN.



now the block will be like blk_12345634444, blk_12345634444_1234.meta

here 1234 is the generation timestamp.

Assume a case, after resuming the write again, DN2 fails, then again recovery 
will starts and will get new Generation time stamp again. now only DN1 in 
pipeline  and block is blk_12345634444, blk_12345634444_1235.meta. resume the 
the remaing data writes and complted the last packet. With the last packet 
blocks should be finalized. DN1 is finalized the block successfully and DN1 
will send blocks received command and block info will be updated in blocks map 
. Assume if DN2 comes back and sending that old block in reports to NN. Here NN 
can find that generation timestamp of that block is lesser than DN1 reported 
blocks genstamp. So, it can take the decision now. it can reject the lesser 
generation time stamp block.



Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will be 
many conditions like length mismatch..etc



Hope it will help you....



Regards,

Uma





________________________________
From: kartheek muthyala [kartheek0...@gmail.com]
Sent: Tuesday, November 29, 2011 7:44 PM
To: hdfs-user
Subject: Generation Stamp

Hi all,
Why is there the concept of Generation Stamp that is getting tagged to the 
metadata of the block.? How is it useful? I have seen that in the hdfs current 
directory, the metafiles are tagged with this generation stamp. Does this keep 
track of the versioning?
~Kartheek.

Reply via email to