On Wed, Feb 12, 2020 at 9:33 AM Alexey Panchenko <alex.panche...@gmail.com> wrote:
> Ah, it's actually much simpler than I thought. > ConnectTimeoutException wraps SocketTimeoutExceptions > but you use ExceptionUtils.getRootCause which retrieves the inner most > exception - which is SocketTimeoutException > > so > 1. logging or posting a full stackrace would make the reason obvious from > the beginning > 2.getRootCause() is not always the right thing to do > > > On Wed, Feb 12, 2020 at 5:22 AM Evan J <maps.this.addr...@gmail.com> > wrote: > > > On Tue, Feb 11, 2020 at 3:08 PM Alexey Panchenko < > alex.panche...@gmail.com > > > > > wrote: > > > > > I guess, It can depends on which level load balancer operates. > > > if load balancer is layer 4 (TCP) then for the operating system it > looks > > > like connection is made directly to the target application. In this > case > > > exception should happen during connect and be wrapped as > > > ConnectTimeoutException > > > but if load balancer is layer 7 (HTTP) then TCP connection is made to > the > > > load balancer (which possibly reads the whole request into memory) and > > then > > > it's trying to establish another TCP connection to the target > application > > > and forward request there. > > > While that happens, the client thinks that it already has sent the > > request > > > to the target application and is already waiting for a response. > > > In this situation SocketTimeoutException can happen. > > > > > > On Tue, Feb 11, 2020 at 6:23 PM Evan J <maps.this.addr...@gmail.com> > > > wrote: > > > > > > > Thank you for your response. > > > > > > > > I'm trying to understand your statement. Are you saying when > connection > > > > times out, SocketTimeoutException is thrown, and then the library > > > captures > > > > it and rethrows ConnectTimeoutException? > > > > > > > > Bottom line, first, is what I explained the expected behavior, and > > > second, > > > > how can we properly distinguish between these two types of timeouts? > > > > > > > > On Tue, Feb 11, 2020, 4:24 AM Oleg Kalnichevski <ol...@apache.org> > > > wrote: > > > > > > > > > On Mon, 2020-02-10 at 23:25 -0500, Evan J wrote: > > > > > > Hi, > > > > > > > > > > > > (looks like I'd sent this to a wrong user group originally) > > > > > > > > > > > > We deploy an application B (which is basically a backend > > application > > > > > > serving a web application) to a cluster of application servers > > (JBoss > > > > > > EAP > > > > > > 7.2 -- 8 instances). These instances send HTTP requests to a set > of > > > > > > gateway > > > > > > applications, call them application Gs (10 instances), where the > > > > > > requests > > > > > > go through an F5 loadbalancer that sits between them. > > > > > > > > > > > > Instances of application B are Spring Boot applications (2.1.x) > > that > > > > > > have > > > > > > been configured with Apache HttpClient 4.5.5 and HttpCore 4.4.9. > > The > > > > > > configuration is almost identical to this github code: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/spring-framework-guru/sfg-blog-posts/tree/master/resttemplate/src/main/java/guru/springframework/resttemplate/config > > > > > > > > > > > > The only exception is that RestTemplateConfig#restTemplate() is > > > > > > configured > > > > > > by creation of RestTemplate and passing > > > > > > HttpComponentsClientHttpRequestFactory to its constructor rather > > than > > > > > > using > > > > > > Spring Boot's RestTemplateBuilder(). > > > > > > > > > > > > The keep alive configuration in application B is basically > defined > > > > > > something like: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/spring-framework-guru/sfg-blog-posts/blob/0152fb0c4acf08d019128ca38c3dd2523871c43c/resttemplate/src/main/java/guru/springframework/resttemplate/config/ApacheHttpClientConfig.java#L52 > > > > > > > > > > > > When a load test is executed in a single stack environment where > > the > > > > > > load > > > > > > balancer is omitted (one application B -> one application G), the > > > > > > requests > > > > > > and responses are processed and validated accordingly (each > request > > > > > > should > > > > > > correspond to a right response and be sent to the web application > > > > > > front > > > > > > end). > > > > > > > > > > > > However, when we run the same load test in a distributed > > environment > > > > > > with a > > > > > > load balancer between application B's and application G's, we > see a > > > > > > lot of > > > > > > SocketTimeoutExceptions being logged, and we notice them very > > quickly > > > > > > (about 5% of total of responses in application B throw that > > > > > > exception). > > > > > > > > > > > > The code structure is very straightforward: > > > > > > > > > > > > try { > > > > > > // RestTemplate call > > > > > > } catch (RestClientException exception) { > > > > > > Exception rootCause = ExceptionUtils.getRootCause(exception); > // > > > > > > Apache > > > > > > Commons lib > > > > > > if (rootCause != null) { > > > > > > if > > > > > > > > (SocketTimeoutException.getClass().getName().equals(rootCause.getClas > > > > > > s().getName()) > > > > > > { > > > > > > // or even if (rootCause instanceof SocketTimeoutException) { > > > > > > // Log for socket timeout > > > > > > } > > > > > > if > > > > > > > > (ConnectTimeoutException.getClass().getName().equals(rootCause.getCla > > > > > > ss().getName()) > > > > > > { > > > > > > // or even if (rootCause instanceof ConnectTimeoutException > ) > > { > > > > > > // Log for connection timeout > > > > > > } > > > > > > } > > > > > > } > > > > > > > > > > > > Application B's keep alive has been set to 20 seconds while the > > > > > > socket > > > > > > timeout has been set to 10 seconds by default (connection timeout > > to > > > > > > 1 > > > > > > second). > > > > > > > > > > > > After placing timer to log how long it takes for an exception is > > > > > > thrown, we > > > > > > saw, the time that it took from the moment RestTemplate is > invoked > > > > > > till the > > > > > > exception is thrown was slightly above 1 second, e.g. 1030 ms, > 1045 > > > > > > ms, > > > > > > 1020 ms, etc.. This led us to increase the connection timeouts > > from 1 > > > > > > second to 2 seconds, and afterward, we didn't get any timeout > > > > > > exception of > > > > > > any sort under the similar load. > > > > > > > > > > > > My question is, why is that the majority of exceptions that are > > being > > > > > > thrown have SocketTimeoutExceptions type as opposed to > > > > > > ConnectTimeoutExceptions which, based on the timeout adjustment > > > > > > mentioned > > > > > > above, appears to be the latter (Connect) vs. socket (read) > > timeout? > > > > > > Note > > > > > > that I said the majority of time, as I've seen a few > > > > > > ConnectTimeoutExceptions as well, but almost 99% of the failed > ones > > > > > > are > > > > > > SocketTimeoutExceptions. > > > > > > > > > > > > Also, in our logs, we log the "rootCause's" class name to avoid > > > > > > ambiguity, > > > > > > but as I mentioned, they are being logged as > > SocketTimeoutExceptions > > > > > > class > > > > > > names. > > > > > > > > > > > > What is Apache Components library doing under the hood that > signals > > > > > > the > > > > > > underlying JDK code to throw SocketTimeoutExceptions rather than > > > > > > ConnectTimeoutException > > > > > > > > > > Connection management code in HttpClient re-throws > > > > > SocketTimeoutExceptions thrown while connecting a socket to its > > remote > > > > > endpoint as ConnectTimeoutException to help distinguish connect > > timeout > > > > > from socket (read operation) timeout. SocketTimeoutExceptions are > > > > > always thrown by the JRE, not by HttpClient. > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org > > > > > For additional commands, e-mail: > httpclient-users-h...@hc.apache.org > > > > > > > > > > > > > > > > > > > > > Apologize for top posting. > > > > Alexey, > > > > I have to verify, but I think the load balancer is a pass through so the > > first scenario might be true. > > > > However, let's assume the load balancer is running in ISO layer 7 mode. > If > > the load balancer is consuming the entire request and tries to establish > > another TCP connection to the target servers, when SocketTimeoutException > > is triggered in the source application (B), it should be an indicative > that > > the maximum time allotted to receive packets between the last exchange > and > > the current one has exceeded. > > > > But the only reason this could happen if the socket read between two > > packets has exceeded 10 seconds while the logs show from the moment the > > request has been made from the source, to the moment when the exception > is > > thrown is merely slightly above 1 second (incidentally a smidgen more > than > > the connect timeout). > > > > Moreover, if we increase the socket (read) timeout to 30 seconds, keep > the > > connect timeout at 1 second, and run the test in the same distributed > > environment with load balancer, we still see the same result (as a matter > > fact, the error rate/timeout exceptions actually increases by a few > > percentage -- 2%-3%). > > > > For successful requests, we can verify through the logs on the target > > applications that the elapse time from when the requests are received and > > results put on wire by the target instances doesn't exceed more than 3 > > seconds (much less than 10 seconds). > > > > For those requests that an exceptions is thrown in the source > applications, > > we know for the fact that the requests don't make it to the target > > instances (we have unique Id generated for each request, so if they are > not > > getting logged on the target instances, we know they never made it there > > which I assume make the second scenario [layer 7] unlikely). > > > > Any other idea as to why this is happening would be appreciated. > > > Alexey, I see it now. The ConnectTimeoutException is wrapping SocketTimeoutException which is not the intended exception I am looking for such scenario. Yes, dumping the stacktrace from the original exception should provided the needed details, however, the application has somewhat a unique requirement to send specific Ids for each of these timeout types in order to notify the appropriate team (don't ask me why). I can do an ugly hack to look for a cause, and if it matches connection timeout, pull it from the immediate cause; otherwise, grab the root cause to handle others (there are other custom exceptions that are wrapped by RestClientException, hence the reason). Was there a reason why the Component's implementation has connection exception wrapping socket exception?