Ah, it's actually much simpler than I thought. ConnectTimeoutException wraps SocketTimeoutExceptions but you use ExceptionUtils.getRootCause which retrieves the inner most exception - which is SocketTimeoutException
so 1. logging or posting a full stackrace would make the reason obvious from the beginning 2.getRootCause() is not always the right thing to do On Wed, Feb 12, 2020 at 5:22 AM Evan J <maps.this.addr...@gmail.com> wrote: > On Tue, Feb 11, 2020 at 3:08 PM Alexey Panchenko <alex.panche...@gmail.com > > > wrote: > > > I guess, It can depends on which level load balancer operates. > > if load balancer is layer 4 (TCP) then for the operating system it looks > > like connection is made directly to the target application. In this case > > exception should happen during connect and be wrapped as > > ConnectTimeoutException > > but if load balancer is layer 7 (HTTP) then TCP connection is made to the > > load balancer (which possibly reads the whole request into memory) and > then > > it's trying to establish another TCP connection to the target application > > and forward request there. > > While that happens, the client thinks that it already has sent the > request > > to the target application and is already waiting for a response. > > In this situation SocketTimeoutException can happen. > > > > On Tue, Feb 11, 2020 at 6:23 PM Evan J <maps.this.addr...@gmail.com> > > wrote: > > > > > Thank you for your response. > > > > > > I'm trying to understand your statement. Are you saying when connection > > > times out, SocketTimeoutException is thrown, and then the library > > captures > > > it and rethrows ConnectTimeoutException? > > > > > > Bottom line, first, is what I explained the expected behavior, and > > second, > > > how can we properly distinguish between these two types of timeouts? > > > > > > On Tue, Feb 11, 2020, 4:24 AM Oleg Kalnichevski <ol...@apache.org> > > wrote: > > > > > > > On Mon, 2020-02-10 at 23:25 -0500, Evan J wrote: > > > > > Hi, > > > > > > > > > > (looks like I'd sent this to a wrong user group originally) > > > > > > > > > > We deploy an application B (which is basically a backend > application > > > > > serving a web application) to a cluster of application servers > (JBoss > > > > > EAP > > > > > 7.2 -- 8 instances). These instances send HTTP requests to a set of > > > > > gateway > > > > > applications, call them application Gs (10 instances), where the > > > > > requests > > > > > go through an F5 loadbalancer that sits between them. > > > > > > > > > > Instances of application B are Spring Boot applications (2.1.x) > that > > > > > have > > > > > been configured with Apache HttpClient 4.5.5 and HttpCore 4.4.9. > The > > > > > configuration is almost identical to this github code: > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/spring-framework-guru/sfg-blog-posts/tree/master/resttemplate/src/main/java/guru/springframework/resttemplate/config > > > > > > > > > > The only exception is that RestTemplateConfig#restTemplate() is > > > > > configured > > > > > by creation of RestTemplate and passing > > > > > HttpComponentsClientHttpRequestFactory to its constructor rather > than > > > > > using > > > > > Spring Boot's RestTemplateBuilder(). > > > > > > > > > > The keep alive configuration in application B is basically defined > > > > > something like: > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/spring-framework-guru/sfg-blog-posts/blob/0152fb0c4acf08d019128ca38c3dd2523871c43c/resttemplate/src/main/java/guru/springframework/resttemplate/config/ApacheHttpClientConfig.java#L52 > > > > > > > > > > When a load test is executed in a single stack environment where > the > > > > > load > > > > > balancer is omitted (one application B -> one application G), the > > > > > requests > > > > > and responses are processed and validated accordingly (each request > > > > > should > > > > > correspond to a right response and be sent to the web application > > > > > front > > > > > end). > > > > > > > > > > However, when we run the same load test in a distributed > environment > > > > > with a > > > > > load balancer between application B's and application G's, we see a > > > > > lot of > > > > > SocketTimeoutExceptions being logged, and we notice them very > quickly > > > > > (about 5% of total of responses in application B throw that > > > > > exception). > > > > > > > > > > The code structure is very straightforward: > > > > > > > > > > try { > > > > > // RestTemplate call > > > > > } catch (RestClientException exception) { > > > > > Exception rootCause = ExceptionUtils.getRootCause(exception); // > > > > > Apache > > > > > Commons lib > > > > > if (rootCause != null) { > > > > > if > > > > > > (SocketTimeoutException.getClass().getName().equals(rootCause.getClas > > > > > s().getName()) > > > > > { > > > > > // or even if (rootCause instanceof SocketTimeoutException) { > > > > > // Log for socket timeout > > > > > } > > > > > if > > > > > > (ConnectTimeoutException.getClass().getName().equals(rootCause.getCla > > > > > ss().getName()) > > > > > { > > > > > // or even if (rootCause instanceof ConnectTimeoutException ) > { > > > > > // Log for connection timeout > > > > > } > > > > > } > > > > > } > > > > > > > > > > Application B's keep alive has been set to 20 seconds while the > > > > > socket > > > > > timeout has been set to 10 seconds by default (connection timeout > to > > > > > 1 > > > > > second). > > > > > > > > > > After placing timer to log how long it takes for an exception is > > > > > thrown, we > > > > > saw, the time that it took from the moment RestTemplate is invoked > > > > > till the > > > > > exception is thrown was slightly above 1 second, e.g. 1030 ms, 1045 > > > > > ms, > > > > > 1020 ms, etc.. This led us to increase the connection timeouts > from 1 > > > > > second to 2 seconds, and afterward, we didn't get any timeout > > > > > exception of > > > > > any sort under the similar load. > > > > > > > > > > My question is, why is that the majority of exceptions that are > being > > > > > thrown have SocketTimeoutExceptions type as opposed to > > > > > ConnectTimeoutExceptions which, based on the timeout adjustment > > > > > mentioned > > > > > above, appears to be the latter (Connect) vs. socket (read) > timeout? > > > > > Note > > > > > that I said the majority of time, as I've seen a few > > > > > ConnectTimeoutExceptions as well, but almost 99% of the failed ones > > > > > are > > > > > SocketTimeoutExceptions. > > > > > > > > > > Also, in our logs, we log the "rootCause's" class name to avoid > > > > > ambiguity, > > > > > but as I mentioned, they are being logged as > SocketTimeoutExceptions > > > > > class > > > > > names. > > > > > > > > > > What is Apache Components library doing under the hood that signals > > > > > the > > > > > underlying JDK code to throw SocketTimeoutExceptions rather than > > > > > ConnectTimeoutException > > > > > > > > Connection management code in HttpClient re-throws > > > > SocketTimeoutExceptions thrown while connecting a socket to its > remote > > > > endpoint as ConnectTimeoutException to help distinguish connect > timeout > > > > from socket (read operation) timeout. SocketTimeoutExceptions are > > > > always thrown by the JRE, not by HttpClient. > > > > > > > > Oleg > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org > > > > For additional commands, e-mail: httpclient-users-h...@hc.apache.org > > > > > > > > > > > > > > > Apologize for top posting. > > Alexey, > > I have to verify, but I think the load balancer is a pass through so the > first scenario might be true. > > However, let's assume the load balancer is running in ISO layer 7 mode. If > the load balancer is consuming the entire request and tries to establish > another TCP connection to the target servers, when SocketTimeoutException > is triggered in the source application (B), it should be an indicative that > the maximum time allotted to receive packets between the last exchange and > the current one has exceeded. > > But the only reason this could happen if the socket read between two > packets has exceeded 10 seconds while the logs show from the moment the > request has been made from the source, to the moment when the exception is > thrown is merely slightly above 1 second (incidentally a smidgen more than > the connect timeout). > > Moreover, if we increase the socket (read) timeout to 30 seconds, keep the > connect timeout at 1 second, and run the test in the same distributed > environment with load balancer, we still see the same result (as a matter > fact, the error rate/timeout exceptions actually increases by a few > percentage -- 2%-3%). > > For successful requests, we can verify through the logs on the target > applications that the elapse time from when the requests are received and > results put on wire by the target instances doesn't exceed more than 3 > seconds (much less than 10 seconds). > > For those requests that an exceptions is thrown in the source applications, > we know for the fact that the requests don't make it to the target > instances (we have unique Id generated for each request, so if they are not > getting logged on the target instances, we know they never made it there > which I assume make the second scenario [layer 7] unlikely). > > Any other idea as to why this is happening would be appreciated. >