[ 
https://issues.apache.org/jira/browse/HDFS-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated HDFS-16508:
-----------------------
    Description: 
The HA is enabled, and we have tow NameNodes: nn1 and nn2.

When starting the cluster, the nn1 fails at the very beginning, and nn2 
transfers to active state. The culster can provide services normally.

However, when we tried to get safe mode or wait exit safe mode, our dfsadmin 
command fails due to a I/O exception: cannot connect to nn1.

The root cause seems locate in here:


{code:java}
//DFSAdmin.class

public void setSafeMode(String[] argv, int idx) throws IOException {

…

if (isHaEnabled) {
      String nsId = dfsUri.getHost();
      List<ProxyAndInfo<ClientProtocol>> proxies =
          HAUtil.getProxiesForAllNameNodesInNameservice(
          dfsConf, nsId, ClientProtocol.class);
      for (ProxyAndInfo<ClientProtocol> proxy : proxies) {
        ClientProtocol haNn = proxy.getProxy();
        //The code always queries from the first nn, i.e., nn1, and returns 
with IOException when nn1 fails.
        boolean inSafeMode = haNn.setSafeMode(action, false);
        if (waitExitSafe) {
          inSafeMode = waitExitSafeMode(haNn, inSafeMode);
        }
        System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF")
            + " in " + proxy.getAddress());
      }
    } 
…
}

{code}

Actually, I'm curious that do we need to get/wait every namenode here when HA 
is enabled?


  was:
The HA is enabled, and we have tow NameNodes: nn1 and nn2.

When starting the cluster, the nn1 fails at the very beginning, and nn2 
transfers to active state. The culster can provide services normally.

However, when we tried to get safe mode or wait exit safe mode, our dfsadmin 
command fails due to a I/O exception: cannot connect to nn1.

The root cause seems locate in here:


{code:java}
//DFSAdmin.class

public void setSafeMode(String[] argv, int idx) throws IOException {

…

if (isHaEnabled) {
      String nsId = dfsUri.getHost();
      List<ProxyAndInfo<ClientProtocol>> proxies =
          HAUtil.getProxiesForAllNameNodesInNameservice(
          dfsConf, nsId, ClientProtocol.class);
      for (ProxyAndInfo<ClientProtocol> proxy : proxies) {
        ClientProtocol haNn = proxy.getProxy();
        //The code always query from the first nn, i.e., nn1, and return with 
IOException when nn1 fails.
        boolean inSafeMode = haNn.setSafeMode(action, false);
        if (waitExitSafe) {
          inSafeMode = waitExitSafeMode(haNn, inSafeMode);
        }
        System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF")
            + " in " + proxy.getAddress());
      }
    } 
…
}

{code}

Actually, I'm curious that do we need to get/wait every namenode here when HA 
is enabled?



> When the nn1 fails at very beginning, admin command that waits exist safe 
> mode fails
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-16508
>                 URL: https://issues.apache.org/jira/browse/HDFS-16508
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.3.1
>            Reporter: May
>            Priority: Major
>
> The HA is enabled, and we have tow NameNodes: nn1 and nn2.
> When starting the cluster, the nn1 fails at the very beginning, and nn2 
> transfers to active state. The culster can provide services normally.
> However, when we tried to get safe mode or wait exit safe mode, our dfsadmin 
> command fails due to a I/O exception: cannot connect to nn1.
> The root cause seems locate in here:
> {code:java}
> //DFSAdmin.class
> public void setSafeMode(String[] argv, int idx) throws IOException {
> …
> if (isHaEnabled) {
>       String nsId = dfsUri.getHost();
>       List<ProxyAndInfo<ClientProtocol>> proxies =
>           HAUtil.getProxiesForAllNameNodesInNameservice(
>           dfsConf, nsId, ClientProtocol.class);
>       for (ProxyAndInfo<ClientProtocol> proxy : proxies) {
>         ClientProtocol haNn = proxy.getProxy();
>         //The code always queries from the first nn, i.e., nn1, and returns 
> with IOException when nn1 fails.
>         boolean inSafeMode = haNn.setSafeMode(action, false);
>         if (waitExitSafe) {
>           inSafeMode = waitExitSafeMode(haNn, inSafeMode);
>         }
>         System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF")
>             + " in " + proxy.getAddress());
>       }
>     } 
> …
> }
> {code}
> Actually, I'm curious that do we need to get/wait every namenode here when HA 
> is enabled?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to