We have a (not-so-micro-anymore) services implementation where the services 
communicate with each other using Jersey Client. The default configuration 
always works just fine with a regular test.

However, we have some system tests that are run after another, including 
some heavy-load tests. Some of the tests now fail with "Connection Reset" 
by jersey client. We have been changing the dropwizard configuration to 
remedy this problem on every release, as there has been always a 
configuration not working, or half implemented on dropwizard IIRC. 

I believe, typically, the problem comes down to having stale connections in 
the connection pool, and some tests making one of the services ending up 
using these stale connections. I think, at the time, HttpClient 
`validateAfterInactivityPeriod` and `retries` configuration were not 
supported or were not functioning as it should. So we had ended up using 
these configuration between services:

jerseyClient:

  timeToLive: 15 minutes


applicationConnectors:
- type: http
  idleTimeout: 15 minutes



This was, strangely, working fine. I think `timeToLive` was also acting as 
`keepAlive` at the time, and `keepAlive` was not working as it should IIRC. 
(It was a long time ago, so the details may be rather wrong). The idea is 
to keep the inter-service connections open for 15 minutes (for perfomance), 
and have an understanding between services about when to kill the 
connection; so they wouldn't bother validating the connections. This was 
working until 1.0.0.


With 1.0.0, "Connection reset" errors have come back. It's rather hard to 
isolate the problem and make it simple to reproduce, but I assume it's still 
the stale connection issue. I'd like to avoid using 
`validateAfterInactivityPeriod` and `retries` (which doesn't work out of the 
box with Jersey Client by the way. Needs 
config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, 
RequestEntityProcessing.BUFFERED); ) as I'm afraid it might affect the 
performance badly. I have tried to set `keepAlive` also to 15 minutes, but that 
didn't help.


Any ideas what might have gone wrong? Or am I being too uptight with `retries` 
and `validateAfterInactivityPeriod`? I will enable retries eventually for the 
sake of safety, but would prefer not to have it as a primary method to fix this 
issue. (Also I'm not  sure how buffering entities would affect the performance).

-- 
You received this message because you are subscribed to the Google Groups 
"dropwizard-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to