[jira] [Commented] (HDDS-14389) [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior

Wei-Chiu Chuang (Jira) Fri, 23 Jan 2026 10:19:09 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053984#comment-18053984
 ]


Wei-Chiu Chuang commented on HDDS-14389:
----------------------------------------

oh, if we can, we should also add a section for client to datanode failover & 
retry:

 

clients retries datanode in order upon failure. For reading blocks, client 
retries each datanode 3 times, 1 second pause in between, before moving to the 
next datanodes. So a client may retry up to 3 * 3 times.

For writing blocks, client retries each datanode 5 times, no pause in between, 
before moving to the next datanodes. So a client may retry up to 3 * 5 times.

 
{code:java}
@Config(key = "ozone.client.max.retries",
    defaultValue = "5",
    description = "Maximum number of retries by Ozone Client on "
        + "encountering exception while writing a key",
    tags = ConfigTag.CLIENT)
private int maxRetryCount = 5;

@Config(key = "ozone.client.retry.interval",
    defaultValue = "0",
    description =
        "Indicates the time duration a client will wait before retrying a "
            + "write key request on encountering an exception. By default "
            + "there is no wait",
    tags = ConfigTag.CLIENT)
private int retryInterval = 0;

@Config(key = "ozone.client.read.max.retries",
    defaultValue = "3",
    description = "Maximum number of retries by Ozone Client on "
        + "encountering connectivity exception when reading a key.",
    tags = ConfigTag.CLIENT)
private int maxReadRetryCount = 3;

@Config(key = "ozone.client.read.retry.interval",
    defaultValue = "1",
    description =
        "Indicates the time duration in seconds a client will wait "
            + "before retrying a read key request on encountering "
            + "a connectivity exception from Datanodes. "
            + "By default the interval is 1 second",
    tags = ConfigTag.CLIENT)
private int readRetryInterval = 1; {code}
----
{code:java}
<property>
  <name>ozone.client.failover.max.attempts</name>
  <value>500</value>
  <description>
    Expert only. Ozone RpcClient attempts talking to each OzoneManager
    ipc.client.connect.max.retries (default = 10) number of times before
    failing over to another OzoneManager, if available. This parameter
    represents the number of times per request the client will failover
    before giving up. This value is kept high so that client does not
    give up trying to connect to OMs easily.
  </description>
</property>
<property>
  <name>ozone.client.wait.between.retries.millis</name>
  <value>2000</value>
  <description>
    Expert only. The time to wait, in milliseconds, between retry attempts
    to contact OM. Wait time increases linearly if same OM is retried
    again. If retrying on multiple OMs proxies in round robin fashion, the
    wait time is introduced after all the OM proxies have been attempted once.
  </description>
</property>
<property>
  <name>ozone.om.admin.protocol.max.retries</name>
  <value>20</value>
  <tag>OM, MANAGEMENT</tag>
  <description>
    Expert only. The maximum number of retries for Ozone Manager Admin
    protocol on each OM.
  </description>
</property>
<property>
  <name>ozone.om.admin.protocol.wait.between.retries</name>
  <value>1000</value>
  <tag>OM, MANAGEMENT</tag>
  <description>
    Expert only. The time to wait, in milliseconds, between retry attempts
    for Ozone Manager Admin protocol.
  </description>
</property> {code}
 

 

> [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior
> -------------------------------------------------------------------------
>
>                 Key: HDDS-14389
>                 URL: https://issues.apache.org/jira/browse/HDDS-14389
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gargi Jaiswal
>            Priority: Major
>
> Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.
> However, it misses another important part of story: client failover behavior.
> HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. 
> The failover or retry may happen if (1) the OM is not not reachable, (2) not 
> a leader, or (3) is a leader but not ready to accept requests.
> The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
> and 2 seconds between each failover retry 
> (ozone.client.wait.between.retries.millis). If the OM is not aware of the 
> current leader, client will try the next OM in round-robin fashion; 
> otherwise, client will retry contacting the current leader.
> Additionally, it is crucial to ensure clients and OM have consistent node 
> mapping configurations, otherwise failover may not reach the leader OM.
> GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior 
> is largely the same. But I don't have much experience with it so I'll just 
> leave it as.
> Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
> series of configuration properties in SCMClientConfig: 
> hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
> hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.
> Having these behaviors documented will help users troubleshoot problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-14389) [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior

Reply via email to