[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274531#comment-14274531 ]
Lei (Eddy) Xu commented on HDFS-6440: ------------------------------------- [~jesse_yates] Thank you so much for working on the patch so quickly. It looks good overall. I have a few comments on the latest patch. 1. {{EditorLogTailer#getActiveNodeProxy}} does not actually throw {{IOException}}. Could you remove it from the function signature? 2. Could you add some descriptions about the expected exceptions for {{MultiNameNodeProxy#doWork()}}, e.g., {code:title=EditLogTailer.java} 387 try { 388 T ret = doWork(); 389 // reset the loop count on success 390 nnLoopCount = 0; 391 return ret; 392 } catch (RemoteException e) { {code} Is it possible that {{doWork}} throws {{IOException}} other than {{RemoteException}}? 3. Could you enforce that {{maxRetries}} is positive after the following code? {code} 157 maxRetries = conf.getInt(DFSConfigKeys.DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_KEY, 158 + DFSConfigKeys.DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_DEFAULT); {code} 4. {{StandbyCheckpointer#activeNNAddresses}} is confusing, since there should be only one active NN. In the old code, since there is only 1 ANN and 1SNN, so SNN can assume other NN is active. 5. I guess the following code is a typo: {{ie}} should be set in catch()? {code:title=StandbyCheckpointer.java} 248 } catch (InterruptedException e) { 249 ie = null; 250 break; 251 } {code} 6. {{needCheckpoint == true}} implies {{sendRequests == true}} thus when call {{doCheckpiont()}}, {{sendRequest}} is always {{true}}. {code} 414 if (needCheckpoint) { 415 doCheckpoint(sendRequest); {code} 7. Could you break this line {code} private NameNodeInfo createNameNode(Configuration conf, boolean format, StartupOption operation {code} 8. Are the changes in 'log4j.properties' necessary? 9. There is a typo in {{dfs.hs....}} {code} public static final String DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_KEY = "dfs.hs.tail-edits.namenode-retries"; {code} 10. I saw you removed {final}s from In my understanding, it is for easier updating {{MiniDFSCluster#namenodes}}, as it is a Multimap. But I still feel that it is safer to set these fields as final and you can use `Multipmap#remove(key, value)` to replace NameNodeInfo? {code} 537 public NameNode nameNode; 538 Configuration conf; 539 String nameserviceId; 540 String nnId; {code} 11. Finally, could you reduce the changes in `MiniDFSCluster.java`, as many of them are not changed, e.g. `MiniDFSCluster.java:911-986`. Thanks again, [~jesse_yates]! > Support more than 2 NameNodes > ----------------------------- > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode > Affects Versions: 2.4.0 > Reporter: Jesse Yates > Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)