[ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Lin updated HDFS-17030:
----------------------------
    Description: 
When namenode HA is enabled and a standby NN is not responsible, we have 
observed it would take a long time to serve a request, even though we have a 
healthy observer or active NN. 

Basically, when a standby is down, the RPC client would (re)try to create 
socket connection to that standby for _ipc.client.connect.timeout_ _* 
ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
heap dump at a standby, the NN still accepts the socket connection but it won't 
send responses to these RPC requests and we would timeout after 
_ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters at 
Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a request 
takes more than 2 mins to complete when we take a heap dump at a standby. This 
has been causing user job failures. 

We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
still use the original value from the config). However, that would double the 
socket connection between clients and the NN (which is a deal-breaker). 

The proposal is to add a timeout on getHAServiceState() calls in 
ObserverReaderProxy and we will only wait for the timeout for an NN to respond 
its HA state. Once we pass that timeout, we will move on to probe the next NN. 

 

  was:
When namenode HA is enabled and a standby NN is not responsible, we have 
observed it would take a long time to serve a request, even though we have a 
healthy observer or active NN. 

Basically, when a standby is down, the RPC client would (re)try to create 
socket connection to that standby for _ipc.client.connect.timeout_ _* 
ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
heap dump at a standby, the NN still accepts the socket connection but it won't 
send responses to these RPC requests and we would timeout after 
_ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters at 
Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a request 
takes more than 2 mins to complete when we take a heap dump at a standby. This 
has been causing user job failures. 

We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
still use the original value from the config). However, that would double the 
socket connection between clients and the NN (which is a deal-breaker). 

The proposal is to add a timeout on getHAServiceState() calls in 
ObserverReaderProxy and we will only wait for the timeout for an NN to respond 
its HA state. Once we pass that timeout, we will move on to the next NN. 

 


> Limit wait time for getHAServiceState in ObserverReaderProxy
> ------------------------------------------------------------
>
>                 Key: HDFS-17030
>                 URL: https://issues.apache.org/jira/browse/HDFS-17030
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.4.0
>            Reporter: Xing Lin
>            Assignee: Xing Lin
>            Priority: Minor
>              Labels: pull-request-available
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to