[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

lindongdong (JIRA) Sat, 10 Feb 2018 20:05:16 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359766#comment-16359766
 ]


lindongdong commented on HDFS-8693:
-----------------------------------

I meet some errors about this patch.

If the cluster has 3 nodes: A, B, C, and the NNs is in A, B.

When we remove B, and install a new SNN in C, all DNs fail to register to the 
new SNN. Error like the below:
{code:java}
2018-02-09 19:49:02,728 | WARN | DataNode: 
[[[DISK]file:/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb_1/bbbbbbbbbbbbbbbbb-bbbbbbbbb_2/bbbbbb_3/b_4/b-5/B-2/B-3/B-4/bbbbbbbbbbbbbbb-bbbbbbbbb/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/189.219.255.103:25006 | Problem connecting to 
server: 189-219-255-103/189.219.255.103:25006 | BPServiceActor.java:197
2018-02-09 19:49:07,731 | WARN | DataNode: 
[[[DISK]file:/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb_1/bbbbbbbbbbbbbbbbb-bbbbbbbbb_2/bbbbbb_3/b_4/b-5/B-2/B-3/B-4/bbbbbbbbbbbbbbb-bbbbbbbbb/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/189.219.255.103:25006 | Exception encountered 
while connecting to the server : javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)] | Client.java:726
{code}

> refreshNamenodes does not support adding a new standby to a running DN
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8693
>                 URL: https://issues.apache.org/jira/browse/HDFS-8693
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, ha
>    Affects Versions: 2.6.0
>            Reporter: Jian Fang
>            Assignee: Ajith S
>            Priority: Critical
>             Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
>         Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException {
> Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

Reply via email to