[ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856636#action_12856636 ]
Sanjay Radia commented on HDFS-1073: ------------------------------------ Todd, thanks for the design. Before you move forward too much on the patch, I would like to get consensus on the 2 alternate designs. Further I think we need to add some items to the design doc. (Given that we may go through 2, 3 versions of the doc would it be better to attach it rather then post in inline? ). Please add the following items to your design doc: * BNN restarts - how does it sync up? What if we have multiple BNNs? * Checkpoint: ** Concurrent checkpoints (saveImage and checkpointer) ** Checkpoint done in the BNN which is also applying the edits stream to its state - Does the notion of spooling in the current design change?) ** Explore the notion of having checkpoints done offline - this is not targeted for the next release but something that we may want down the road; we need to evaluate the designs against this. (of course we also need to evaluate whether or not offline checkpoints are a good idea in the first place.) * Managing edits and images in an HA environment. Here the idea is to move the image and edits to shared storage and treat the NN as "diskless". This is esp useful for federation when there are mulitple NNs. Moving/writing the image to shared storage is not difficult and it avoids the need to send the image back to the primary NN. Moving the edits to share storage is hard because of the latency requirements. Here book-keeper can come to the rescue; I don't see any other solutions so far. I am *not* proposing very detailed design of the above items since we don't have the resources to do all that. However as we evaluate the 2 alternate design lets use the above items to guide us. > Simpler model for Namenode's fs Image and edit Logs > ---------------------------------------------------- > > Key: HDFS-1073 > URL: https://issues.apache.org/jira/browse/HDFS-1073 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sanjay Radia > Assignee: Todd Lipcon > > The naming and handling of NN's fsImage and edit logs can be significantly > improved resulting simpler and more robust code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira