[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time

Chao Sun (Jira) Fri, 06 Dec 2019 10:23:04 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990059#comment-16990059
 ]


Chao Sun commented on HDFS-15024:
---------------------------------

{quote}
Chao Sun I think the msync case is just a case, maybe the current problem is a 
common problem for Support more than 2 NameNodes？
{quote}

yes you are correct. This is a more general problem for multi-sbn feature but I 
think we could optimize {{msync}} specifically to avoid the retry backoff. 

Regarding patch v1, seems it only handles the first few retries and later on 
when {{times}} gradually increment to passes beyond {{numNameNodes - 1 }}, it 
will still do exponential backoff on all the SBNs.

> [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a 
> condition of calculation of sleep time
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15024
>                 URL: https://issues.apache.org/jira/browse/HDFS-15024
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.10.0, 3.3.0, 3.2.1
>            Reporter: huhaiyang
>            Assignee: huhaiyang
>            Priority: Major
>              Labels: multi-sbnn
>         Attachments: HDFS-15024.001.patch, client_error.log
>
>
> When we enable the ONN , there will be three NN nodes for the client 
> configuration,
> Such as configuration
> <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn2,nn3,nn1</value>
> </property>
> Currently, 
> nn2 is in standby state
> nn3 is in observer state 
> nn1 is in active state
> When the user performs an access HDFS operation
> ./bin/hadoop --loglevel debug fs 
> -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
>  -mkdir /user/haiyang1/test8
> You need to request nn1 when you execute the msync method,
> Actually connect nn2 first and failover is required
> In connection nn3 does not meet the requirements, failover needs to be 
> performed, but at this time, failover operation needs to be performed during 
> a period of hibernation
> Finally, it took a period of hibernation to connect the successful request to 
> nn1
> In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current 
> default implementation is Sleep time is calculated when more than one 
> failover operation is performed
> I think that the Number of NameNodes as a condition of calculation of sleep 
> time is more reasonable
> That is, in the current test, executing failover on connection nn3 does not 
> need to sleep time to directly connect to the next nn node
> See client_error.log for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time

Reply via email to