[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179188#comment-13179188 ]
Aaron T. Myers commented on HDFS-2291: -------------------------------------- Thanks a lot for providing this patch, Todd. What's below are mostly nits. I agree that there could be a few more comments for the new public methods, so I didn't include that in my feedback. # {{dfs.namenode.standby.checkpoints}} - perhaps include ".ha" in there to make it clear that this option is only applicable in an HA setup? # Might as well make the members of {{CheckpointConf}} final. # {{LOG.info("Counted txns in " + file + ": " + val.getNumTransactions());}} - Either should be removed or should not be info level. # {{prepareStopStandbyServices}} is kind of a weird name. Perhaps "prepareToStopStandbyServices" ? # "// TODO interface audience" in {{TransferFsImage}} # Does it not seem strange to you that the order of operations when setting a state is "prepareExit -> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter -> enter" ? i.e. I don't think there's a correctness issue here, but if I were designing a system where this set of events is triggered, I'd go with the latter. # What's the point of the changes in {{EditLogTailer}}? # "TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment is no longer applicable to this patch, right? # {{LOG.info("Time for a checkpoint !");}} - while strictly accurate, this doesn't seem to be the most helpful log message. # Can we make {{CheckpointerThread}} a static inner class? # {{e.printStackTrace();}} in {{CheckpointerThread}} should probably be tossed. # Nit: in {{CheckpointerThread#doWork}}: "if(UserGroupInformation.isSecurityEnabled())" - space between "if" and "(", and curly braces around body of "if". # You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now" ? # Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded to it? I realize the case when both nodes are in standby is already handled by this patch, since you don't allow checkpoints if the node already has a checkpoint for a given txid, but I mean from a principled perspective. It seems kind of odd to me that two nodes both sitting in standby would be doing checkpoint transfers at all. > HA: Checkpointing in an HA setup > -------------------------------- > > Key: HDFS-2291 > URL: https://issues.apache.org/jira/browse/HDFS-2291 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Aaron T. Myers > Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2291.txt, hdfs-2291.txt > > > We obviously need to create checkpoints when HA is enabled. One thought is to > use a third, dedicated checkpointing node in addition to the active and > standby nodes. Another option would be to make the standby capable of also > performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira