[ https://issues.apache.org/jira/browse/HDFS-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDFS-17769: ---------------------------------- Labels: pull-request-available (was: ) > Allows client to actively retry to Active NameNode when the Observer NameNode > is too far behind client state id. > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17769 > URL: https://issues.apache.org/jira/browse/HDFS-17769 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.3.4, 3.3.6, 3.4.1 > Reporter: Guo Wei > Priority: Major > Labels: pull-request-available > Fix For: 3.4.2 > > Attachments: 1.png, 2.png, 3.png > > > When we use Router to forward read requests to the observer, if the cluster > experiences heavy write workloads, Observer nodes may fail to keep pace with > edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter > is configured, it may still occur. > This triggers RetriableException: Observer Node is too far behind errors. > Especially when the client ipc.client.ping parameter is set to true, it will > strive to wait and constantly retry, which can cause the business to be > unable to obtain the desired data timely. We should consider having the > active namenode handle this at this time. > Here are our some errors and repair verification: > The stateid of the observer is too far behind the active:1.png > > RetriableException:2.png > > repair verification : 3.png > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org