[ 
https://issues.apache.org/jira/browse/HDFS-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943545#comment-17943545
 ] 

ASF GitHub Bot commented on HDFS-17769:
---------------------------------------

gyz-web opened a new pull request, #7602:
URL: https://github.com/apache/hadoop/pull/7602

     When we use Router to forward read requests to the observer, if the 
cluster experiences heavy write workloads, Observer nodes may fail to keep pace 
with edit log synchronization, even if the dfs.ha.tail-edits.in-progress 
parameter is configured, it may still occur.This triggers RetriableException: 
Observer Node is too far behind errors. Especially when the client 
ipc.client.ping parameter is set to true, it will strive to wait and constantly 
retry, which can cause the business to be unable to obtain the desired data 
timely. We should consider having the active namenode handle this at this time.
   
   Here are our some errors and repair verification:
   
   1.The stateid of the observer is too far behind the active:
   
![图片1](https://github.com/user-attachments/assets/7e9df660-e67d-4aaa-878a-25e083afaa0d)
   
   2.RetriableException:
   
![图片2](https://github.com/user-attachments/assets/a691d613-c1b9-4c81-9282-cd10fe337029)
   
   
   3.repair verification:
   
![图片3](https://github.com/user-attachments/assets/ab235d1d-95ce-481f-9ee8-7220db9d8897)
   




> Allows client to actively retry to Active NameNode when the Observer NameNode 
> is too far behind client state id.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17769
>                 URL: https://issues.apache.org/jira/browse/HDFS-17769
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.4, 3.3.6, 3.4.1
>            Reporter: Guo Wei
>            Priority: Major
>             Fix For: 3.4.2
>
>         Attachments: 1.png, 2.png, 3.png
>
>
> When we use Router to forward read requests to the observer, if the cluster 
> experiences heavy write workloads, Observer nodes may fail to keep pace with 
> edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter 
> is configured, it may still occur.
> This triggers RetriableException: Observer Node is too far behind errors. 
> Especially when the client ipc.client.ping parameter is set to true, it will 
> strive to wait and constantly retry, which can cause the business to be 
> unable to obtain the desired data timely. We should consider having the 
> active namenode handle this at this time.
> Here are our some errors and repair verification:
> The stateid of the observer is too far behind the active:1.png
>  
> RetriableException:2.png
>  
> repair verification : 3.png
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to