[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240452#comment-14240452
 ] 

Jesse Yates commented on HDFS-6440:
-----------------------------------

bq. What is the procedure for adding or replacing NNs?
Not explicitly more easily than currently supported. The problem is that all 
the nodes currently have the NNs hard-coded in config. What you could do is 
roll the NNs with the new NN config. Then roll the rest of the clients with the 
new config as well, once the new NN is to date. I don't know if you would even 
do anything different than currently configured.

bq. Could it support dynamically adding NNs without downtime?
Not really. You would have to push the downtime question up a level, and rely 
on something like ZK to maintain the list of NNs (on the simple approach). It 
reduces down to a group membership problem.

bq. Would it be possible to avoid multiple SNNs to upload fsimages with trivial 
deltas in a short time
Sure. This was the idea behind adding the 'primary checkpointer' logic - if you 
are not the primary, then you backoff for 2x the usual wait period, because you 
assume the primary is up and doing edits, but check again every so often to 
make sure it hasn't gotten too far behind. Obviously there is a possibility for 
who is the 'primary checkpointer' to ping-pong back and forth between SNNs, but 
generally it would be one that gets the lead and keeps it.

bq. Would it be possible that this behavior makes other SNNs miss the edit logs?
Its possible, but that's a somewhat rare occurrence as you can generally bring 
the NN back up fairly quickly. If its really far behind, you can then bootstrap 
up to the current NNs state and run it from there. In practice, we haven't seen 
any problems with this.

bq. Does this work support rolling upgrade?
I'm not aware that it would change it.

bq. Would it makes client failover more complicated?
Now instead of two servers, it can fail over between N. I believe the client 
code currently supports this as-is.

bq. What would be the impact on the DN side?
Basically, just in block reports to more than 2 NNs. This can start to cause 
some bandwidth congestion at some point, but I don't think it would be a 
problem with up to at least 5 or 7 nodes.

bq. What are the changes on the test resources files (hadoop-*-reserved.tgz) ?
The mini-cluster is designed for supporting only two NNs, down to the files it 
writes to maintain the directly layout. Unfortunately, it doesn't manage the 
directories in any easily updated way, so I had to rip the existing directory 
structure it uses and replace it with something a little more flexible. The 
changes to the zip files is just to support this updated structure for the 
mini-cluster.

> Support more than 2 NameNodes
> -----------------------------
>
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to