[ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852652#action_12852652 ]
Sanjay Radia commented on HDFS-1073: ------------------------------------ This Jira proposes a simpler design for for managing fsimage and edit logs. The edit logs and fsimage scheme in current Hadoop requires coordination and can lead to tricky bugs (HDFS-955). Proposed design # All transactions have a transaction ID. A transaction ID is a number that starts at zero and incremented. Each journal record of the editLogs file has the transaction ID. # Fsimage file is identified by the transaction ID of the last checkpointed transaction in the file. ** E.g. fsImage_<transactionIDofLastTranscationChekpointed> # An editsLog file is identified by the transaction ID of the first recorded transaction in the file. ** E.g. fsEditlogs<transactionIdofFirstTransaction> # To start the name server, ** Load the fsImage with the greatest transactionID N. If no image exist, take N to be 0. ** Process all transactions >N from the editsLog: Find an editsLog that includes transaction with IDs N+1. Process all transactions >= N+1 from that and all subsequent editLogs files. Salient points * This scheme does not require any synchronization between when fsImages are checkpointed and editsLogs files are split (although it is convenient if when you checkpoint at transactionID N, then you also spilt your edits logs at N or slightly less). * This means that the NameNode and BackupNode can share images and edits without coordination. (This is very different from the current design.). For example the primary NN can decided that it wants a checkpoint and hence split the editLogs and ask the backup NN to do a checkpoint; the checkpoint operation can succeed or fail without worries. (Btw if the split of the editLogs is recorded as the last transaction in the edit logs then the backup NN will see that transaction come across and realize that this is convenient time to checkpoint. * The scheme does not require coordination between checkpointing fsImages themselves! For example, while the backup NN is doing a checkpoint, the NN could be asked to do a saveImage by the admin. * Policies on how many edits and fsimages to keep is separable. > Simpler model for Namenode's fs Image and edit Logs > ---------------------------------------------------- > > Key: HDFS-1073 > URL: https://issues.apache.org/jira/browse/HDFS-1073 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sanjay Radia > > The naming and handling of NN's fsImage and edit logs can be significantly > improved resulting simpler and more robust code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.