[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

Todd Lipcon (JIRA) Fri, 05 Nov 2010 12:05:11 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928764#action_12928764
 ]


Todd Lipcon commented on HDFS-1073:
-----------------------------------

Hey all. Back in town after a few weeks in Japan, sorry for the relative 
absence.

bq. I do not see or did not understand the rational for "I'm quitting!" record. 
Why should NN care whether last record was lost or not, just keep going with 
what it has. Worked so far.

I think one complication here is that we currently never have to re-open an 
edits file for append, since when we start, we always save a "fresh" checkpoint 
image and empty "edits" if there were any edits to apply. One advantage of the 
new design is that we no longer have to do this - we just bump the edits log 
number to the next one in sequence - ie we roll on startup if the latest edit 
log is non-empty.

bq. Also the "rolled" transaction is a nice way to to tell the BN that the 
primary did a roll without any special message from NN to BNN

The patch currently does exactly that - we just don't write down the special 
"roll" entry in any file streams. We certainly could, though, if it's useful to 
know that a file was completely written.

bq. Todd, I briefly looked at the patch. It looks like you are trying to get 
rid of the Journal Spool in BN. Correct me if I am wrong. I don't think you can

In the patch, the spooling has just become a bit more of a general case. Rather 
than spooling to a special file, we simply ask the primary NN to roll, and then 
wait for the roll to happen. While waiting for the roll, we continue to apply 
edits. One we get the special "roll" record, we stop applying edits and make a 
checkpoint at that point. Once the checkpoint completes, we "converge" by 
continuing to read forward in the sequence of log files until we hit the end 
and are back "in sync"

bq. A backup NN should not ask for a roll. The primary should roll when it 
feels it is necessary.

I think the simplest will be if anyone may ask for a roll - ie CN, BN, or NN. 
The NN of course is the one that actually makes the decision, but the decision 
may be in response to a request from one of the other nodes. I think this 
ability is useful not just for CN,BN, and NN, but also for example in backup 
scripts - you may ask the NN to roll right before making a tarball of the edits 
directory, and thus be sure that you get all of the current edits in 
"finalized" files.




> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1073.txt, hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

Reply via email to