Guo Wei created HDFS-17769:
------------------------------
Summary: Allows client to actively retry to Active NameNode when
the Observer NameNode is too far behind client state id.
Key: HDFS-17769
URL: https://issues.apache.org/jira/browse/HDFS-17769
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 3.4.1, 3.3.6, 3.3.4
Reporter: Guo Wei
Fix For: 3.4.2
Attachments: 1.png, 2.png, 3.png
When we use Router to forward read requests to the observer, if the cluster
experiences heavy write workloads, Observer nodes may fail to keep pace with
edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter
is configured, it may still occur.
This triggers RetriableException: Observer Node is too far behind errors.
Especially when the client ipc.client.ping parameter is set to true, it will
strive to wait and constantly retry, which can cause the business to be unable
to obtain the desired data timely. We should consider having the active
namenode handle this at this time.
Here are our some errors and repair verification:
The stateid of the observer is too far behind the active:1.png
RetriableException:2.png
repair verification : 3.png
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]