[jira] [Updated] (HDDS-4068) Client Retry for ipc.client.connect.max.retries when first OM is down

Bharat Viswanadham (Jira) Wed, 05 Aug 2020 17:27:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bharat Viswanadham updated HDDS-4068:
-------------------------------------
    Description: 
Right now retry logic on client to OM is, it will try connect to OM1, if it is 
leader fine, else try with next OM and so on. If OM1, is down, client retries 
for 50 times when ipc.client.connect.max.retries is set to 50 and 
ipc.client.connect.retry.interval default to 1sec, so a total of 50seconds is 
spent in retry and then move to next OM. 
I think here client -> OM should have its own retry policy, in this way if the 
first OM is down, to complete request, the user does not need to wait for 50sec.

As ipc.client.connect.retry.interval and ipc.client.connect.max.retries  are 
common configurations for RPC, creating a new default retry policy with smaller 
values would be nice. 



{code:java}
20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
{code}


  was:
Right now retry logic on client to OM is, it will try connect to OM1, if it is 
leader fine, else try with next OM and so on. If OM1, is down, client retries 
for 50 times when ipc.client.connect.max.retries is set to 50 and 
ipc.client.connect.retry.interval default to 1sec, I think here client -> OM, 
should have its own retry policy, in this way if the first OM is down, to 
complete request, user does not need to wait for 50sec.

As ipc.client.connect.retry.interval and ipc.client.connect.max.retries  are 
common configurations for RPC, creating a new default retry policy with smaller 
values would be nice. 



{code:java}
20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server: 
bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 
MILLISECONDS)
{code}



> Client Retry for ipc.client.connect.max.retries when first OM is down
> ---------------------------------------------------------------------
>
>                 Key: HDDS-4068
>                 URL: https://issues.apache.org/jira/browse/HDDS-4068
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Bharat Viswanadham
>            Priority: Major
>
> Right now retry logic on client to OM is, it will try connect to OM1, if it 
> is leader fine, else try with next OM and so on. If OM1, is down, client 
> retries for 50 times when ipc.client.connect.max.retries is set to 50 and 
> ipc.client.connect.retry.interval default to 1sec, so a total of 50seconds is 
> spent in retry and then move to next OM. 
> I think here client -> OM should have its own retry policy, in this way if 
> the first OM is down, to complete request, the user does not need to wait for 
> 50sec.
> As ipc.client.connect.retry.interval and ipc.client.connect.max.retries  are 
> common configurations for RPC, creating a new default retry policy with 
> smaller values would be nice. 
> {code:java}
> 20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server: 
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s); 
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, 
> sleepTime=1000 MILLISECONDS)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-4068) Client Retry for ipc.client.connect.max.retries when first OM is down

Reply via email to