[ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928764#action_12928764 ]
Todd Lipcon commented on HDFS-1073: ----------------------------------- Hey all. Back in town after a few weeks in Japan, sorry for the relative absence. bq. I do not see or did not understand the rational for "I'm quitting!" record. Why should NN care whether last record was lost or not, just keep going with what it has. Worked so far. I think one complication here is that we currently never have to re-open an edits file for append, since when we start, we always save a "fresh" checkpoint image and empty "edits" if there were any edits to apply. One advantage of the new design is that we no longer have to do this - we just bump the edits log number to the next one in sequence - ie we roll on startup if the latest edit log is non-empty. bq. Also the "rolled" transaction is a nice way to to tell the BN that the primary did a roll without any special message from NN to BNN The patch currently does exactly that - we just don't write down the special "roll" entry in any file streams. We certainly could, though, if it's useful to know that a file was completely written. bq. Todd, I briefly looked at the patch. It looks like you are trying to get rid of the Journal Spool in BN. Correct me if I am wrong. I don't think you can In the patch, the spooling has just become a bit more of a general case. Rather than spooling to a special file, we simply ask the primary NN to roll, and then wait for the roll to happen. While waiting for the roll, we continue to apply edits. One we get the special "roll" record, we stop applying edits and make a checkpoint at that point. Once the checkpoint completes, we "converge" by continuing to read forward in the sequence of log files until we hit the end and are back "in sync" bq. A backup NN should not ask for a roll. The primary should roll when it feels it is necessary. I think the simplest will be if anyone may ask for a roll - ie CN, BN, or NN. The NN of course is the one that actually makes the decision, but the decision may be in response to a request from one of the other nodes. I think this ability is useful not just for CN,BN, and NN, but also for example in backup scripts - you may ask the NN to roll right before making a tarball of the edits directory, and thus be sure that you get all of the current edits in "finalized" files. > Simpler model for Namenode's fs Image and edit Logs > ---------------------------------------------------- > > Key: HDFS-1073 > URL: https://issues.apache.org/jira/browse/HDFS-1073 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sanjay Radia > Assignee: Todd Lipcon > Attachments: hdfs-1073.txt, hdfs1073.pdf > > > The naming and handling of NN's fsImage and edit logs can be significantly > improved resulting simpler and more robust code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.