[ 
https://issues.apache.org/jira/browse/HDDS-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-4068:
---------------------------------
    Summary: Client should not retry same OM on network connection failure  
(was: Client Retry for ipc.client.connect.max.retries when first OM is down)

> Client should not retry same OM on network connection failure
> -------------------------------------------------------------
>
>                 Key: HDDS-4068
>                 URL: https://issues.apache.org/jira/browse/HDDS-4068
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: OM HA, Ozone Client
>            Reporter: Bharat Viswanadham
>            Assignee: Hanisha Koneru
>            Priority: Major
>
> Right now retry logic on client to OM is, it will try connect to OM1, if it 
> is leader fine, else try with next OM and so on. If OM1, is down, client 
> retries for 50 times when ipc.client.connect.max.retries is set to 50 and 
> ipc.client.connect.retry.interval default to 1sec, so a total of 50seconds is 
> spent in retry and then move to next OM. 
> I think here client -> OM should have its own retry policy, in this way if 
> the first OM is down, to complete request, the user does not need to wait for 
> 50sec.
> As ipc.client.connect.retry.interval and ipc.client.connect.max.retries  are 
> common configurations for RPC, creating a new default retry policy with 
> smaller values would be nice. 
> {code:java}
> 20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to