Re: (Micro) services, HttpClient, "SocketException: connection reset"

Natan Abolafya Tue, 09 Aug 2016 02:45:42 -0700

Unfortunately, setting the `timeToLive&keepAlive` to 14 minutes and keeping 
`idleTimeout` at 15 didn't help.


Natan

On Tuesday, August 9, 2016 at 10:38:15 AM UTC+2, Natan Abolafya wrote:
>
> Hello Artem,
>
> Thanks for the shot, I'll give it a try :).
> I'm upgrading it from 0.9 to 1.0. But I was having this problem before as 
> well, though I might have set the configuration as such on 0.7 or 0.8. Were 
> they also checking the connection before re-using it?
>
> I've quite a few combinations, including going back to default config with 
> `validateAfterInactivity` to 10 seconds (maybe too much?), but none worked 
> yet.
>
> Natan
>
>
> On Tuesday, August 9, 2016 at 10:12:34 AM UTC+2, Artem Prigoda wrote:
>>
>> Hello Natan,
>>
>> From which version of Dropwizard are you migrating to 1.0.0? Dropwizard's 
>> Jersey client implementations uses
>> Apache's HTTP client. Before version 4.4 (Dropwizard 0.9)  the client 
>> checked every connections in the pool on 
>> being stalled before re-using it. It was changed in 4.4 and now it's 
>> checked only after `timeToLive` period. So, if 
>> your TTL on the client side is the same as on the server, there could be 
>> situations when the server could sent a
>> RST flag, while a connection is still in the pool on the client. You 
>> could try to set the timeout a little bit less than
>> the server and see if this helps. Alternatively, you could try to set 
>> `validateAfterInactivity`, but this will help only
>> if the issue happens with inactive connections leased back to the pool 
>> (which is probably not your case). 
>> Just a shot in the dark.
>>
>> Artem
>>
>> On Monday, August 8, 2016 at 1:31:19 PM UTC+2, Natan Abolafya wrote:
>>>
>>> We have a (not-so-micro-anymore) services implementation where the 
>>> services communicate with each other using Jersey Client. The default 
>>> configuration always works just fine with a regular test.
>>>
>>> However, we have some system tests that are run after another, including 
>>> some heavy-load tests. Some of the tests now fail with "Connection Reset" 
>>> by jersey client. We have been changing the dropwizard configuration to 
>>> remedy this problem on every release, as there has been always a 
>>> configuration not working, or half implemented on dropwizard IIRC. 
>>>
>>> I believe, typically, the problem comes down to having stale connections 
>>> in the connection pool, and some tests making one of the services ending up 
>>> using these stale connections. I think, at the time, HttpClient 
>>> `validateAfterInactivityPeriod` and `retries` configuration were not 
>>> supported or were not functioning as it should. So we had ended up using 
>>> these configuration between services:
>>>
>>> jerseyClient:
>>>
>>>   timeToLive: 15 minutes
>>>
>>>
>>> applicationConnectors:
>>> - type: http
>>>   idleTimeout: 15 minutes
>>>
>>>
>>>
>>> This was, strangely, working fine. I think `timeToLive` was also acting 
>>> as `keepAlive` at the time, and `keepAlive` was not working as it should 
>>> IIRC. (It was a long time ago, so the details may be rather wrong). The 
>>> idea is to keep the inter-service connections open for 15 minutes (for 
>>> perfomance), and have an understanding between services about when to kill 
>>> the connection; so they wouldn't bother validating the connections. This 
>>> was working until 1.0.0.
>>>
>>>
>>> With 1.0.0, "Connection reset" errors have come back. It's rather hard to 
>>> isolate the problem and make it simple to reproduce, but I assume it's 
>>> still the stale connection issue. I'd like to avoid using 
>>> `validateAfterInactivityPeriod` and `retries` (which doesn't work out of 
>>> the box with Jersey Client by the way. Needs 
>>> config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, 
>>> RequestEntityProcessing.BUFFERED); ) as I'm afraid it might affect the 
>>> performance badly. I have tried to set `keepAlive` also to 15 minutes, but 
>>> that didn't help.
>>>
>>>
>>> Any ideas what might have gone wrong? Or am I being too uptight with 
>>> `retries` and `validateAfterInactivityPeriod`? I will enable retries 
>>> eventually for the sake of safety, but would prefer not to have it as a 
>>> primary method to fix this issue. (Also I'm not  sure how buffering 
>>> entities would affect the performance).
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"dropwizard-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: (Micro) services, HttpClient, "SocketException: connection reset"

Reply via email to