[ 
https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071388#comment-15071388
 ] 

Masatake Iwasaki commented on HDFS-9376:
----------------------------------------

The failover thread in {{HAStressTestHarness}} will invoke failover 
periodically with fixed sleep time. The {{msBetweenFailovers}} is set to 1000 
ms for {{TestSeveralNameNodes}}.

{code}
        for (int i = 0; i < nns; i++) {
          int next = (i + 1) % nns;
          ...
          cluster.transitionToStandby(i);
          cluster.transitionToActive(next);
          ...
          Thread.sleep(msBetweenFailovers);
{code}

Retry proxy of client have sleep time exponential to number of retries on 
failover. The client is possible to sleep up to around 15 seconds if it 
repeatedly fails on the operation. The client may not get enough effective run 
time due to this.

{noformat}
  2015-12-24 12:22:00,784 [Thread-250] INFO  retry.RetryInvocationHandler 
(RetryInvocationHandler.java:invoke(147)) - Exception while invoking create of 
class ClientNamenodeProtocolTranslatorPB over localhost/127.0.0.1:42201 after 4 
fail over attempts. Trying to fail over after sleeping for 10161ms.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
{noformat}

> TestSeveralNameNodes fails occasionally
> ---------------------------------------
>
>                 Key: HDFS-9376
>                 URL: https://issues.apache.org/jira/browse/HDFS-9376
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Masatake Iwasaki
>
> TestSeveralNameNodes has been failing in precommit builds.  It usually times 
> out on waiting for the last thread to finish writing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to