[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274531#comment-14274531
 ] 

Lei (Eddy) Xu commented on HDFS-6440:
-------------------------------------

[~jesse_yates] Thank you so much for working on the patch so quickly. It looks 
good overall.

I have a few comments on the latest patch.

 1. {{EditorLogTailer#getActiveNodeProxy}} does not actually throw 
{{IOException}}. Could you remove it from the function signature?

2. Could you add some descriptions about the expected exceptions for 
{{MultiNameNodeProxy#doWork()}}, e.g., 

{code:title=EditLogTailer.java}
387             try {
388               T ret = doWork();
389               // reset the loop count on success
390               nnLoopCount = 0;
391               return ret;
392             } catch (RemoteException e) {
{code}

Is it possible that {{doWork}} throws {{IOException}} other than 
{{RemoteException}}?

3. Could you enforce that {{maxRetries}} is positive after the following code?

{code}
157         maxRetries = 
conf.getInt(DFSConfigKeys.DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_KEY,
158             +      
DFSConfigKeys.DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_DEFAULT);
{code}

4. {{StandbyCheckpointer#activeNNAddresses}} is confusing, since there should 
be only one active NN. In the old code, since there is only 1 ANN and 1SNN, so 
SNN can assume other NN is active.

5. I guess the following code is a typo: {{ie}} should be set in catch()?

{code:title=StandbyCheckpointer.java}
248        } catch (InterruptedException e) {
249             ie = null;
250             break;
251           }
{code}

6. {{needCheckpoint == true}} implies {{sendRequests == true}} thus when call 
{{doCheckpiont()}},  {{sendRequest}} is always {{true}}.

{code}
414      if (needCheckpoint) {
415                 doCheckpoint(sendRequest);
{code}


7. Could you break this line
{code}
private NameNodeInfo createNameNode(Configuration conf, boolean format, 
StartupOption operation
{code}

8. Are the changes in 'log4j.properties' necessary?

9. There is a typo in {{dfs.hs....}}
{code}
public static final String DFS_HA_TAILEDITS_ALL_NAMESNODES_RETRY_KEY = 
"dfs.hs.tail-edits.namenode-retries";
{code}

10. I saw you removed {final}s from  In my understanding, it is for easier 
updating {{MiniDFSCluster#namenodes}}, as it is a Multimap. But I still feel 
that it is safer to set these fields as final and you can use 
`Multipmap#remove(key, value)` to replace NameNodeInfo?

{code}
537         public NameNode nameNode;
538         Configuration conf;
539         String nameserviceId;
540         String nnId;
{code}

11. Finally, could you reduce the changes in `MiniDFSCluster.java`, as many of 
them are not changed, e.g. `MiniDFSCluster.java:911-986`.

Thanks again, [~jesse_yates]!

> Support more than 2 NameNodes
> -----------------------------
>
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to