[ https://issues.apache.org/jira/browse/HDDS-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanisha Koneru updated HDDS-4068: --------------------------------- Summary: Client should not retry same OM on network connection failure (was: Client Retry for ipc.client.connect.max.retries when first OM is down) > Client should not retry same OM on network connection failure > ------------------------------------------------------------- > > Key: HDDS-4068 > URL: https://issues.apache.org/jira/browse/HDDS-4068 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: OM HA, Ozone Client > Reporter: Bharat Viswanadham > Assignee: Hanisha Koneru > Priority: Major > > Right now retry logic on client to OM is, it will try connect to OM1, if it > is leader fine, else try with next OM and so on. If OM1, is down, client > retries for 50 times when ipc.client.connect.max.retries is set to 50 and > ipc.client.connect.retry.interval default to 1sec, so a total of 50seconds is > spent in retry and then move to next OM. > I think here client -> OM should have its own retry policy, in this way if > the first OM is down, to complete request, the user does not need to wait for > 50sec. > As ipc.client.connect.retry.interval and ipc.client.connect.max.retries are > common configurations for RPC, creating a new default retry policy with > smaller values would be nice. > {code:java} > 20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > 20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > 20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > 20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > 20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > 20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server: > bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, > sleepTime=1000 MILLISECONDS) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org