Fengnan Li created HDFS-14914:
---------------------------------

             Summary: Observer should throw StandbyException in Safemode
                 Key: HDFS-14914
                 URL: https://issues.apache.org/jira/browse/HDFS-14914
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Fengnan Li
            Assignee: Fengnan Li
         Attachments: HDFS-14914-001.patch

When observer is in safemode, calling getBlockLocations will make it throw 
RetriableException as inĀ 
[HDFS-13898|https://issues.apache.org/jira/browse/HDFS-13898]. However, during 
startup the safemode is taking a really long time and retry would not help much 
here.

What makes it worse is that when Routers talking to Observers, since Router 
distinguishes StandbyException and RetriableException, it will keep retry 
(default 3) times and then return to the client an RetriableException. The 
client will retry again on the same Router and to the same Observer for default 
10 times, resulting in 3 * 10 = 30 retries per call.

The change is to make it failover so that Router can immediately try another 
Observer or Active namenode (depends on the design). The current 
ObserverReadProxyProvider doesn't get affected since both RetriableException 
and StandbyException will make it failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to