Hi, (looks like I'd sent this to a wrong user group originally)
We deploy an application B (which is basically a backend application serving a web application) to a cluster of application servers (JBoss EAP 7.2 -- 8 instances). These instances send HTTP requests to a set of gateway applications, call them application Gs (10 instances), where the requests go through an F5 loadbalancer that sits between them. Instances of application B are Spring Boot applications (2.1.x) that have been configured with Apache HttpClient 4.5.5 and HttpCore 4.4.9. The configuration is almost identical to this github code: https://github.com/spring-framework-guru/sfg-blog-posts/tree/master/resttemplate/src/main/java/guru/springframework/resttemplate/config The only exception is that RestTemplateConfig#restTemplate() is configured by creation of RestTemplate and passing HttpComponentsClientHttpRequestFactory to its constructor rather than using Spring Boot's RestTemplateBuilder(). The keep alive configuration in application B is basically defined something like: https://github.com/spring-framework-guru/sfg-blog-posts/blob/0152fb0c4acf08d019128ca38c3dd2523871c43c/resttemplate/src/main/java/guru/springframework/resttemplate/config/ApacheHttpClientConfig.java#L52 When a load test is executed in a single stack environment where the load balancer is omitted (one application B -> one application G), the requests and responses are processed and validated accordingly (each request should correspond to a right response and be sent to the web application front end). However, when we run the same load test in a distributed environment with a load balancer between application B's and application G's, we see a lot of SocketTimeoutExceptions being logged, and we notice them very quickly (about 5% of total of responses in application B throw that exception). The code structure is very straightforward: try { // RestTemplate call } catch (RestClientException exception) { Exception rootCause = ExceptionUtils.getRootCause(exception); // Apache Commons lib if (rootCause != null) { if (SocketTimeoutException.getClass().getName().equals(rootCause.getClass().getName()) { // or even if (rootCause instanceof SocketTimeoutException) { // Log for socket timeout } if (ConnectTimeoutException.getClass().getName().equals(rootCause.getClass().getName()) { // or even if (rootCause instanceof ConnectTimeoutException ) { // Log for connection timeout } } } Application B's keep alive has been set to 20 seconds while the socket timeout has been set to 10 seconds by default (connection timeout to 1 second). After placing timer to log how long it takes for an exception is thrown, we saw, the time that it took from the moment RestTemplate is invoked till the exception is thrown was slightly above 1 second, e.g. 1030 ms, 1045 ms, 1020 ms, etc.. This led us to increase the connection timeouts from 1 second to 2 seconds, and afterward, we didn't get any timeout exception of any sort under the similar load. My question is, why is that the majority of exceptions that are being thrown have SocketTimeoutExceptions type as opposed to ConnectTimeoutExceptions which, based on the timeout adjustment mentioned above, appears to be the latter (Connect) vs. socket (read) timeout? Note that I said the majority of time, as I've seen a few ConnectTimeoutExceptions as well, but almost 99% of the failed ones are SocketTimeoutExceptions. Also, in our logs, we log the "rootCause's" class name to avoid ambiguity, but as I mentioned, they are being logged as SocketTimeoutExceptions class names. What is Apache Components library doing under the hood that signals the underlying JDK code to throw SocketTimeoutExceptions rather than ConnectTimeoutException?