[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281537#comment-14281537 ]
Jesse Yates commented on HDFS-6440: ----------------------------------- Some follow up after actually looking at the code: bq. Is it possible that doWork throws IOException other than RemoteException? Yup. In fact, the implemention of doWork at EditLogTailer#ln291 can throw an IOException if the call to the proxy for rollEditLog throws an IOException. Sure, this is a bit brittle - a remoteException could be thrown by that call (or any other) as an IOException, but that really can't be helped because we have no other way of differentiating right now. bq. 6. needCheckpoint == true implies sendRequests == true thus when call doCheckpiont(), sendRequest is always true. Yup, that was a slight logic bug. I think setting send request should look like: {code:title=StandbyCheckpointer.java} // on all nodes, we build the checkpoint. However, we only ship the checkpoint if have a // rollback request, are the checkpointer, are outside the quiet period. boolean sendRequest = needCheckpoint && (isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod()); {code} to actually not send the request every time - it wasn't going to break anything before, but now it should actually conserve bandwidth :) bq. 7. Could you break this line My IDE has that at 99 chars long - isn't 100 chars the standard line width? However, I moved the IOE from the rest of the signature up to the second half of the method declaration. bq. 11. Finally, could you reduce the changes in `MiniDFSCluster.java`, as many of them are not changed, e.g. `MiniDFSCluster.java:911-986`. I think I'm at the minimal number of changes there. Git thinks there are line add and removes frequently when things move around a bit, as this patch necessitates. Fortunately, they should be easy to ignore... but let me know if I'm missing what you are getting at. > Support more than 2 NameNodes > ----------------------------- > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode > Affects Versions: 2.4.0 > Reporter: Jesse Yates > Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)