[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240366#comment-14240366
 ] 

Lei (Eddy) Xu commented on HDFS-6440:
-------------------------------------

[~jesse_yates] Thanks for working on this cool feature. We have read your 
design doc and came up only a few questions:

#  What is the procedure for adding or replacing NNs? Could it support 
dynamically adding NNs without downtime?
# It seems that whether to upload a fsimage is mostly determined by SNN (e.g., 
finishing a checkpoint). Would it be possible to avoid mulitple SNNs to upload 
fsimages with trivial deltas in a short time? E.g., let ANN to reject upload 
requests if {{lastUploadTime > now - quiet period && num of edits < N}} ?
# It seems that QJM inherits the behaviors from the current ANN/SNN design that 
it will purge edit logs after *_one_* SNN uploads a fsimage. Would it be 
possible that this behavior makes other SNNs miss the edit logs? E.g., if a SNN 
crashes and comes back online, but the edit logs are purged?
# Does this work support rolling upgrade?
# Would it makes client failover more complicated? 

And some minor concerns:
# What would be the impact on the DN side?
# What are the changes on the test resources files (hadoop-*-reserved.tgz) ? 

Thanks again for this awesome work!

> Support more than 2 NameNodes
> -----------------------------
>
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to