GitHub user aledsage opened a pull request:
https://github.com/apache/brooklyn-server/pull/448
BROOKLYN-394: increase jclouds retry/backoff time
Question: Is 500ms and 6 retries a sensible level? It feels to me like a
large backoff is good for API calls to a cloud. I can see this might slow
things down in some situations (e.g. when it was a transient connectivity
problem), but that still seems unlikely to happen often. In all the important
cases I can think of, a larger backoff + retry time seems desirable.
When running the `testCreateMany` to provision 20 VMs concurrently in AWS,
I managed to cause rate-limiting when calling `RunInstances`, getting back `503
Service Unavailable` for 6 of the 20 VMs:
```
grep -E "JavaUrlHttpCommandExecutorService.*Receiving.* 503 Service
Unavailable" brooklyn.debug.log
2016-11-20 21:41:07,014 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-7]: Receiving response 305126632: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,027 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-17]: Receiving response -202425525: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,181 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-20]: Receiving response 1461817670: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,902 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-7]: Receiving response -412329992: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,951 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-17]: Receiving response -2106831550: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,094 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-20]: Receiving response -1404718861: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,575 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-11]: Receiving response 1776862310: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,419 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-15]: Receiving response 1334001839: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service
Unavailable
```
Here's the output for one of them:
```
016-11-20 21:41:07,774 DEBUG o.j.r.i.InvokeHttpMethod [pool-3-thread-13]:
>> invoking RunInstances
2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,191 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 1/6: delaying for 541 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,143 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 2/6: delaying for 2143 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,697 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 3/6: delaying for 4681 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:17,536 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1803030217: HTTP/1.1 200 OK
```
Note that it didn't succeed until we'd backed off multiple times for some
of the `RunInstances` calls, with it taking a 4.7 second backoff above before
it worked on the 4th attempt. I therefore suspect it was actually making things
*worse* when we retried after 50ms, 100ms, 200ms, 400ms and 800ms (e.g. causing
concurrent calls from other threads to be a lot more likely to fail, and not
succeeding in any of the 5 retries).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aledsage/brooklyn-server
BROOKLYN-394-retry-backoff-time
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/brooklyn-server/pull/448.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #448
----
commit 18cdc98d36f74da10d8987382dba77994de3b75d
Author: Aled Sage <[email protected]>
Date: 2016-11-20T21:52:51Z
BROOKLYN-394: increase jclouds retry/backoff time
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---