[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173895#comment-13173895 ]
Todd Lipcon commented on HDFS-2291: ----------------------------------- I plan to start working on this tomorrow. My thinking is to have a checkpoint thread which wakes up on the checkpoint interval, stops the edit log tailer thread, enters safe mode, creates a checkpoint, and comes back out of safemode. If at any point the SB needs to process a failover, it will cancel the checkpoint (using the HDFS-2507 feature) and proceed as usual. The remaining question I've yet to figure out is whether it should (a) save the checkpoints into the shared edits directory, or (b) save in its own and then upload the checkpoints to the primary via HTTP just like the 2NN does today. "b" is probably preferable since the shared edits directory may in fact be BK or some other journal plugin in the future, whereas "a" would break the abstraction. If anyone has any strong opinions please shout now :) > HA: Checkpointing in an HA setup > -------------------------------- > > Key: HDFS-2291 > URL: https://issues.apache.org/jira/browse/HDFS-2291 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Aaron T. Myers > Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > > We obviously need to create checkpoints when HA is enabled. One thought is to > use a third, dedicated checkpointing node in addition to the active and > standby nodes. Another option would be to make the standby capable of also > performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira