Unfortunately, setting the `timeToLive&keepAlive` to 14 minutes and keeping `idleTimeout` at 15 didn't help.
Natan On Tuesday, August 9, 2016 at 10:38:15 AM UTC+2, Natan Abolafya wrote: > > Hello Artem, > > Thanks for the shot, I'll give it a try :). > I'm upgrading it from 0.9 to 1.0. But I was having this problem before as > well, though I might have set the configuration as such on 0.7 or 0.8. Were > they also checking the connection before re-using it? > > I've quite a few combinations, including going back to default config with > `validateAfterInactivity` to 10 seconds (maybe too much?), but none worked > yet. > > Natan > > > On Tuesday, August 9, 2016 at 10:12:34 AM UTC+2, Artem Prigoda wrote: >> >> Hello Natan, >> >> From which version of Dropwizard are you migrating to 1.0.0? Dropwizard's >> Jersey client implementations uses >> Apache's HTTP client. Before version 4.4 (Dropwizard 0.9) the client >> checked every connections in the pool on >> being stalled before re-using it. It was changed in 4.4 and now it's >> checked only after `timeToLive` period. So, if >> your TTL on the client side is the same as on the server, there could be >> situations when the server could sent a >> RST flag, while a connection is still in the pool on the client. You >> could try to set the timeout a little bit less than >> the server and see if this helps. Alternatively, you could try to set >> `validateAfterInactivity`, but this will help only >> if the issue happens with inactive connections leased back to the pool >> (which is probably not your case). >> Just a shot in the dark. >> >> Artem >> >> On Monday, August 8, 2016 at 1:31:19 PM UTC+2, Natan Abolafya wrote: >>> >>> We have a (not-so-micro-anymore) services implementation where the >>> services communicate with each other using Jersey Client. The default >>> configuration always works just fine with a regular test. >>> >>> However, we have some system tests that are run after another, including >>> some heavy-load tests. Some of the tests now fail with "Connection Reset" >>> by jersey client. We have been changing the dropwizard configuration to >>> remedy this problem on every release, as there has been always a >>> configuration not working, or half implemented on dropwizard IIRC. >>> >>> I believe, typically, the problem comes down to having stale connections >>> in the connection pool, and some tests making one of the services ending up >>> using these stale connections. I think, at the time, HttpClient >>> `validateAfterInactivityPeriod` and `retries` configuration were not >>> supported or were not functioning as it should. So we had ended up using >>> these configuration between services: >>> >>> jerseyClient: >>> >>> timeToLive: 15 minutes >>> >>> >>> applicationConnectors: >>> - type: http >>> idleTimeout: 15 minutes >>> >>> >>> >>> This was, strangely, working fine. I think `timeToLive` was also acting >>> as `keepAlive` at the time, and `keepAlive` was not working as it should >>> IIRC. (It was a long time ago, so the details may be rather wrong). The >>> idea is to keep the inter-service connections open for 15 minutes (for >>> perfomance), and have an understanding between services about when to kill >>> the connection; so they wouldn't bother validating the connections. This >>> was working until 1.0.0. >>> >>> >>> With 1.0.0, "Connection reset" errors have come back. It's rather hard to >>> isolate the problem and make it simple to reproduce, but I assume it's >>> still the stale connection issue. I'd like to avoid using >>> `validateAfterInactivityPeriod` and `retries` (which doesn't work out of >>> the box with Jersey Client by the way. Needs >>> config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, >>> RequestEntityProcessing.BUFFERED); ) as I'm afraid it might affect the >>> performance badly. I have tried to set `keepAlive` also to 15 minutes, but >>> that didn't help. >>> >>> >>> Any ideas what might have gone wrong? Or am I being too uptight with >>> `retries` and `validateAfterInactivityPeriod`? I will enable retries >>> eventually for the sake of safety, but would prefer not to have it as a >>> primary method to fix this issue. (Also I'm not sure how buffering >>> entities would affect the performance). >>> >>> -- You received this message because you are subscribed to the Google Groups "dropwizard-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
