Thanks Uma..:) On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G <mahesw...@huawei.com>wrote:
> Yes. :-) > ------------------------------ > *From:* kartheek muthyala [kartheek0...@gmail.com] > *Sent:* Tuesday, November 29, 2011 10:20 PM > *To:* hdfs-user@hadoop.apache.org > *Subject:* Re: Generation Stamp > > Uma, first of all thanks for the detailed exemplified explanation. > > So to confirm, the primary use of having this generationTimeStamp is to > ensure consistency of the block?. So, when the pipeline is failed at DN3, > and the client invokes recovery, then the NN will chose DN1 to complete the > pipeline. The DN1 first updates its metafile with the new time stamp, and > then passes this information to the other replica at DN2. Further, in the > future NN sees that this particular block is under replicated and it > assigns some other DNa and asks either DN1/DN2 to replicate the same at > DNa. > > > Thanks, > Kartheek. > > > On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G <mahesw...@huawei.com > > wrote: > >> Generationstamp is basically to keep track of the replica states. >> >> Consider one scenario where generation smap will be use: >> >> Create a file which has one block. client started writing that block to >> DN1, DN 2, DN3 ( pipeline ) >> >> After writing some data DN3 failed, then Client will get the exception >> about pipeline failuere. Then Client will handle that exception ( you can >> see it in processDataNodeError in DataStreamer thread) . It will remove DN3 >> and will call the recovery for that block with new generation time stamp, >> then NN will choose one primary DN and assign block synchronization >> work.Then primary DN will ensure that all the remainnng block lengths are >> same ( if require it will truncate to consistant length) and will invoke >> committblckSynchronization. Then remaing datatransfer will resume. >> >> >> >> now block will have new genartion timestamp. You can observe this in >> metadata file for that block in DN. >> >> >> >> now the block will be like blk_12345634444, blk_12345634444_1234.meta >> >> here 1234 is the generation timestamp. >> >> Assume a case, after resuming the write again, DN2 fails, then again >> recovery will starts and will get new Generation time stamp again. now only >> DN1 in pipeline and block is blk_12345634444, blk_12345634444_1235.meta. >> resume the the remaing data writes and complted the last packet. With the >> last packet blocks should be finalized. DN1 is finalized the block >> successfully and DN1 will send blocks received command and block info will >> be updated in blocks map . Assume if DN2 comes back and sending that old >> block in reports to NN. Here NN can find that generation timestamp of that >> block is lesser than DN1 reported blocks genstamp. So, it can take the >> decision now. it can reject the lesser generation time stamp block. >> >> >> >> Yu can see this code in FSNameSystem#addStoredBlock. ofcource there will >> be many conditions like length mismatch..etc >> >> >> >> Hope it will help you.... >> >> >> >> Regards, >> >> Uma >> >> >> >> >> ------------------------------ >> *From:* kartheek muthyala [kartheek0...@gmail.com] >> *Sent:* Tuesday, November 29, 2011 7:44 PM >> *To:* hdfs-user >> *Subject:* Generation Stamp >> >> Hi all, >> Why is there the concept of Generation Stamp that is getting tagged to >> the metadata of the block.? How is it useful? I have seen that in the hdfs >> current directory, the metafiles are tagged with this generation stamp. >> Does this keep track of the versioning? >> ~Kartheek. >> > >