I do have a thought - sometimes its just important to make it work, not to drill into perfectly what it is - I have had some luck with the Resilience4J library, where I set it up on a short timeout - connections that work work within, say, 30ms - and if they take more than 50ms, they're never coming back on my measurements. So - I set the timeout to 50ms, the delay to 10ms, and the retry count to 3. It can make noise every time it retries - but thanks to the squirrely behavior I was seeing under load, which seemed more infrastructure than server based, the retry solved the problem neatly.
The similar sounding problem I had was spates of rest calls between services at different cloud providers going through different load balance/api management system layers. It seemed to me that a low percentage of my attempts to connect would just disappear - like a packet loss or a process crash on the load balance - remote service would never see a packet. Just a thought, hope it helps. I love to aggressively chase solutions. David On Fri, Mar 15, 2024 at 4:58 PM Richard Tippl <richard.ti...@gmail.com> wrote: > Hello, > > I am supporting a Spring Boot application, which uses HttpClient 5 in the > background. We're mainly using PoolingHttpClientConnectionManager to send a > large amount of requests to a target server. > > We're experiencing some network issues (socket connection timeouts during > high load scenarios) and in trying to locate them, I've begun the process > of trying to look into what actually happens during connection > establishment. > My idea was to measure the time it takes for certain steps taken when > creating a connection. Mainly I wanted to measure TCP socket open and SSL > handshake. > > The initial version I've come up with uses (abuses) the > ConnectionSocketFactory interface, wrapping it in a way to measure the > length of execution for connectSocket. This gives the sum of TCP open and > SSL handshake. > This way I can at least get some numbers and use them to help with locating > and resolving the issues. > > There are 2 issues with this approach, as far as i can tell, I can't > measure these times separately, and in the newest alpha version of 5.4 the > interface I'm using has been deprecated and replaced by the > DefaultHttpClientConnectionOperator, which performs all of the connection > steps in a single method call. > > Am I missing some easier way to plug into the flow of creating a connection > and getting the ability to measure what I wish to measure? Will it still be > possible after the deprecated interfaces get removed? Is there a way I > could measure both socket open and SSL handshake separately? > The metrics I've achieved so far already started showing us certain trends > and extending them could help us more in trying to solve these issues. > > Thanks for responding. > > Richard > -- Dog approved this message.