[
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044799#comment-18044799
]
ASF GitHub Bot commented on HDFS-15419:
---------------------------------------
github-actions[bot] closed pull request #2082: HDFS-15419. RBF: Router should
retry communicate with NN when cluster is unavailable using configurable time
interval
URL: https://github.com/apache/hadoop/pull/2082
> RBF: Router should retry communicate with NN when cluster is unavailable
> using configurable time interval
> ---------------------------------------------------------------------------------------------------------
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: configuration, hdfs-client, rbf
> Reporter: bhji123
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When cluster is unavailable, router -> namenode communication will only retry
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than
> 1000 nodes, we have encountered this problem. In some cases, the cluster
> becomes unavailable briefly for about 10 or 30 seconds, at the same time,
> almost all rpc requests to router failed because router only retry once
> without time interval.
> It's better for us to enhance the router retry strategy, to retry
> **communicate with NN using configurable time interval and max retry times.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]