[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852652#action_12852652
 ] 

Sanjay Radia commented on HDFS-1073:
------------------------------------

This Jira proposes a simpler design for for managing fsimage and edit logs.
The edit logs and fsimage scheme in current Hadoop requires coordination and 
can lead to tricky bugs  (HDFS-955).


Proposed design

#       All transactions have a transaction ID. A transaction ID is a number 
that  starts at zero and incremented. Each journal record of the editLogs file 
has the transaction ID.
#       Fsimage file is identified by the transaction ID  of the last 
checkpointed transaction in the file.
**  E.g. fsImage_<transactionIDofLastTranscationChekpointed>
#       An editsLog file is identified by the transaction ID  of the first 
recorded transaction in the file.
** E.g. fsEditlogs<transactionIdofFirstTransaction> 
#       To start the name server,
** Load the fsImage with the greatest transactionID N. If no image exist, take 
N to be 0.
** Process all transactions >N from the editsLog: Find an editsLog that 
includes transaction with IDs N+1. Process all transactions >= N+1 from that 
and all subsequent editLogs files.

Salient points
* This scheme does not require any synchronization between when fsImages are 
checkpointed and editsLogs files are split (although it is convenient if when 
you checkpoint at transactionID  N, then you also spilt your edits logs at N or 
slightly less).
* This means that the NameNode and BackupNode can share images and edits 
without coordination. (This is very different from  the current design.). For 
example the primary NN can decided that it wants a checkpoint and hence split 
the editLogs and ask the backup NN to do a checkpoint; the checkpoint operation 
can succeed or fail without worries. (Btw if the split of the editLogs is 
recorded as the last transaction in the edit logs then the backup NN will see 
that transaction come across and realize that this is convenient time to 
checkpoint. 
* The scheme does not  require  coordination between checkpointing fsImages 
themselves! For example, while the backup NN is doing a checkpoint, the NN 
could be asked to do a saveImage by the admin.  
* Policies on how many edits and fsimages to keep is separable. 


> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to