[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180423#comment-13180423 ] Hudson commented on HDFS-2291: -- Integrated in Hadoop-Hdfs-HAbranch-build #38 (See [https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/38/]) HDFS-2291. Allow the StandbyNode to make checkpoints in an HA setup. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1227411 Files : * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointConf.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SaveNamespaceCancelledException.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAContext.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAState.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSNNTopology.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogsDuringFailover.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179850#comment-13179850 ] Todd Lipcon commented on HDFS-2291: --- bq. dfs.namenode.standby.checkpoints - perhaps include .ha in there to make it clear that this option is only applicable in an HA setup renamed to dfs.ha.standby.checkpoints and DFS_HA_STANDBY_CHECKPOINTS_KEY {quote} Might as well make the members of CheckpointConf final. LOG.info(Counted txns in + file + : + val.getNumTransactions()); - Either should be removed or should not be info level. prepareStopStandbyServices is kind of a weird name. Perhaps prepareToStopStandbyServices ? // TODO interface audience in TransferFsImage TODO: need to cancel the savenamespace operation if it's in flight - I think this comment is no longer applicable to this patch, right? LOG.info(Time for a checkpoint !); - while strictly accurate, this doesn't seem to be the most helpful log message. e.printStackTrace(); in CheckpointerThread should probably be tossed. Nit: in CheckpointerThread#doWork: if(UserGroupInformation.isSecurityEnabled()) - space between if and (, and curly braces around body of if. You use System.currentTimeMillis in a bunch of places. How about replacing with o.a.h.hdfs.server.common.Util#now ? {quote} fixed the above bq. Does it not seem strange to you that the order of operations when setting a state is prepareExit - prepareEnter - exit - enter, instead of prepareExit - exit - prepareEnter - enter The point of the {{prepare*}} methods is that they have to happen before the lock is taken. So, {{prepareEnter}} can't happen after {{exit}}, because the lock already is held there. I clarified the javadoc a bit. bq. What's the point of the changes in EditLogTailer? In order for the test to spy on saveNamespace, I had to move the {{getFSImage}} call down. Otherwise, the spy wasn't getting picked up properly and the test was failing. bq. Can we make CheckpointerThread a static inner class? Currently it calls {{doCheckpoint}} in the outer class. I suppose it could be static, but it isn't really easy to test in isolation anyway, so I'm going to punt o this. bq. Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded to it? Yes and no... I sort of see your point. But, people have also discussed an external tool which would perform checkpoints for many clusters and then upload them HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179877#comment-13179877 ] Aaron T. Myers commented on HDFS-2291: -- bq. Yes and no... I sort of see your point. But, people have also discussed an external tool which would perform checkpoints for many clusters and then upload them I'm still a little leery of this behavior, but I don't feel strongly about it, so let's just roll with it. I should have said this earlier, but I'd also recommend changing prepareEnterState to prepareToEnterState, and likewise for exit. Otherwise the patch looks good to me. +1 once that's addressed. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179188#comment-13179188 ] Aaron T. Myers commented on HDFS-2291: -- Thanks a lot for providing this patch, Todd. What's below are mostly nits. I agree that there could be a few more comments for the new public methods, so I didn't include that in my feedback. # {{dfs.namenode.standby.checkpoints}} - perhaps include .ha in there to make it clear that this option is only applicable in an HA setup? # Might as well make the members of {{CheckpointConf}} final. # {{LOG.info(Counted txns in + file + : + val.getNumTransactions());}} - Either should be removed or should not be info level. # {{prepareStopStandbyServices}} is kind of a weird name. Perhaps prepareToStopStandbyServices ? # // TODO interface audience in {{TransferFsImage}} # Does it not seem strange to you that the order of operations when setting a state is prepareExit - prepareEnter - exit - enter, instead of prepareExit - exit - prepareEnter - enter ? i.e. I don't think there's a correctness issue here, but if I were designing a system where this set of events is triggered, I'd go with the latter. # What's the point of the changes in {{EditLogTailer}}? # TODO: need to cancel the savenamespace operation if it's in flight - I think this comment is no longer applicable to this patch, right? # {{LOG.info(Time for a checkpoint !);}} - while strictly accurate, this doesn't seem to be the most helpful log message. # Can we make {{CheckpointerThread}} a static inner class? # {{e.printStackTrace();}} in {{CheckpointerThread}} should probably be tossed. # Nit: in {{CheckpointerThread#doWork}}: if(UserGroupInformation.isSecurityEnabled()) - space between if and (, and curly braces around body of if. # You use System.currentTimeMillis in a bunch of places. How about replacing with o.a.h.hdfs.server.common.Util#now ? # Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded to it? I realize the case when both nodes are in standby is already handled by this patch, since you don't allow checkpoints if the node already has a checkpoint for a given txid, but I mean from a principled perspective. It seems kind of odd to me that two nodes both sitting in standby would be doing checkpoint transfers at all. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) Attachments: hdfs-2291.txt, hdfs-2291.txt We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174328#comment-13174328 ] Eli Collins commented on HDFS-2291: --- Ditto, option (b) seems preferable. I think we should minimize the difference between the 2NN and the SBN checkpointing since we'll have to support both. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173895#comment-13173895 ] Todd Lipcon commented on HDFS-2291: --- I plan to start working on this tomorrow. My thinking is to have a checkpoint thread which wakes up on the checkpoint interval, stops the edit log tailer thread, enters safe mode, creates a checkpoint, and comes back out of safemode. If at any point the SB needs to process a failover, it will cancel the checkpoint (using the HDFS-2507 feature) and proceed as usual. The remaining question I've yet to figure out is whether it should (a) save the checkpoints into the shared edits directory, or (b) save in its own and then upload the checkpoints to the primary via HTTP just like the 2NN does today. b is probably preferable since the shared edits directory may in fact be BK or some other journal plugin in the future, whereas a would break the abstraction. If anyone has any strong opinions please shout now :) HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173903#comment-13173903 ] Aaron T. Myers commented on HDFS-2291: -- I support option b, not only for the reason stated above. Option b also implicitly solves the problem of what to do about fsimages in the standby, as well as just seeming overall safer. I'm leery of any plan which involves the standby temporarily writing to the shared edits dir. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158060#comment-13158060 ] Eli Collins commented on HDFS-2291: --- Agree that the SBN should be able to do checkpoints - someone running a typical 20x configuration with two hosts (NN and 2NN) should be able to keep the same hardware config (NN and SBN). HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094863#comment-13094863 ] Todd Lipcon commented on HDFS-2291: --- Ravi: the docs are right -- the 2NN needs as much memory as the NN. But the same is true of the SBN. But it's the same memory - a copy of the namespace, etc. So, I agree that the SBN should be able to do checkpoints. We just need to implement a checkpoint abort functionality. I will look into this. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093223#comment-13093223 ] Suresh Srinivas commented on HDFS-2291: --- My preference is to do checkpointing in standby. If standby is in the middle checkpointing, it should abandon checkpointing and become active. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093253#comment-13093253 ] Ravi Prakash commented on HDFS-2291: Will the standby+checkpointing node have to have twice the memory? I thought the main reason for running a secondary namenode on a different machine was because checkpointing needed just as much memory as the namenode needed to maintain metadata. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093285#comment-13093285 ] Suresh Srinivas commented on HDFS-2291: --- bq. Will the standby+checkpointing node have to have twice the memory? No bq. I thought the main reason for running a secondary namenode on a different machine was because checkpointing needed just as much memory as the namenode needed to maintain metadata. The reason why we do not do it in primary is, checkpointing blocks the operations. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093287#comment-13093287 ] Ravi Prakash commented on HDFS-2291: @Suresh: Thanks! Blame not the dev for what he read in docs outdated http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode :) bq. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092225#comment-13092225 ] Uma Maheswara Rao G commented on HDFS-2291: --- {quote} One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. {quote} Introducing new nodes may create overheads in setting up the clusters. we can always think to reduce the cluster complexities to create setups. {quote} Another option would be to make the standby capable of also performing the function of checkpointing. {quote} IMO , standby can do checkpointing job. what do you say Aaron? HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira