[
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852652#action_12852652
]
Sanjay Radia commented on HDFS-1073:
------------------------------------
This Jira proposes a simpler design for for managing fsimage and edit logs.
The edit logs and fsimage scheme in current Hadoop requires coordination and
can lead to tricky bugs (HDFS-955).
Proposed design
# All transactions have a transaction ID. A transaction ID is a number
that starts at zero and incremented. Each journal record of the editLogs file
has the transaction ID.
# Fsimage file is identified by the transaction ID of the last
checkpointed transaction in the file.
** E.g. fsImage_<transactionIDofLastTranscationChekpointed>
# An editsLog file is identified by the transaction ID of the first
recorded transaction in the file.
** E.g. fsEditlogs<transactionIdofFirstTransaction>
# To start the name server,
** Load the fsImage with the greatest transactionID N. If no image exist, take
N to be 0.
** Process all transactions >N from the editsLog: Find an editsLog that
includes transaction with IDs N+1. Process all transactions >= N+1 from that
and all subsequent editLogs files.
Salient points
* This scheme does not require any synchronization between when fsImages are
checkpointed and editsLogs files are split (although it is convenient if when
you checkpoint at transactionID N, then you also spilt your edits logs at N or
slightly less).
* This means that the NameNode and BackupNode can share images and edits
without coordination. (This is very different from the current design.). For
example the primary NN can decided that it wants a checkpoint and hence split
the editLogs and ask the backup NN to do a checkpoint; the checkpoint operation
can succeed or fail without worries. (Btw if the split of the editLogs is
recorded as the last transaction in the edit logs then the backup NN will see
that transaction come across and realize that this is convenient time to
checkpoint.
* The scheme does not require coordination between checkpointing fsImages
themselves! For example, while the backup NN is doing a checkpoint, the NN
could be asked to do a saveImage by the admin.
* Policies on how many edits and fsimages to keep is separable.
> Simpler model for Namenode's fs Image and edit Logs
> ----------------------------------------------------
>
> Key: HDFS-1073
> URL: https://issues.apache.org/jira/browse/HDFS-1073
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sanjay Radia
>
> The naming and handling of NN's fsImage and edit logs can be significantly
> improved resulting simpler and more robust code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.