Re: HTTP/2 client and closed connections

2024-03-15 Thread Oleg Kalnichevski




On 14/03/2024 21:53, Jonathan Amir wrote:

Thanks for your answers, they help a lot.
So far what I am understanding about the connection closed situation is
that it is recoverable, but it is the responsibility of the caller to
implement its own retry mechanism.


It is potentially recoverable but the exact recovery strategy tends to 
be application specific.



The retry isn't built-in to the http client code itself but the recovery of
the internally managed connection is handled by the http client.
Is that right?



No, it is not. There a default retry-mechanism in place but it applies 
to idempotent methods only. Please take a look at 
DefaultHttpRequestRetryStrategy. HttpClient should automatically re-try 
idempotent methods, but non-idempotent method recovery is application 
specific and is considered a responsibility of the caller.


ConnectionClosedException is presently handled as non-recoverable by 
default. I am actually not sure this should be the case, but this is how 
it is now.


@Michael, do you happen to remember why we ended up treating 
ConnectionClosedException as non-recoverable?


However, It is always advisable to have one's own application specific 
recovery strategy.




Also, a small follow-up question about the TTL, how is it related (or not)
to ConnectionConfig.setValidateAfterInactivity, 


Validate-after-inactivity and total-time-to-live are unrelated. Both, 
however, can cause the connection to be considered expired.


and are those two related

to client's builder evictIdleConnections method?


Those two settings are unrelated to idle connection eviction. A 
connection can be perfectly valid but it stays idle too long, it can get 
dropped.




Between idleness, TTL, and validation, what is supposed to be the correct
way to use these three configurations together?



This is entirely up to the caller. One may want to have a fairly long 
TTL, say 15 minutes, to ensure connections get refreshed every once in a 
while. The validate-after-inactivity check is not cheap. One should use 
it sparingly. It is usually up to the caller to decide, what is more 
preferable, a certain performance hit due to the 
validate-after-inactivity check or an occasional i/o exception due to 
the connection being stale. What connections are considered idle is 
entirely up to the caller.


Oleg



On Wed, Mar 13, 2024 at 5:14 AM Oleg Kalnichevski  wrote:


On Tue, 2024-03-12 at 21:58 -0400, Jonathan Amir wrote:

Hello,
I am building an HTTP/2 only client for running multiple requests in
parallel.
I understand that there is no connection pool internally, rather
there is
one connection per host. For simplicity, let's say all my requests go
to
the same host.



This is correct. It is still technically a pool though (per host
basis).



I have a situation where under stress there are some errors. It
starts with
socket timeout (several threads in parallel), and after a while there
is a
ConnectionClosedException.


Even though HTTP/2 has a proper connection termination handshake, the
handshake is potentially racy. Under high load
ConnectionClosedException can and will happen. Your application code
must be prepared to handle those.



I am not sure what is the flow of events that leads to this, and what
is
the relationship between those errors. I also don't know if it is my
client
or the server that closed the connection.



It, of source, would help greatly to know what exactly happens and
leads to ConnectionClosedException.



My initial question is, since there is only one connection maintained
internally, how does one recover from ConnectionClosedException? The
connection life-cycle is opaque to me - there is no pool, and no
eviction
strategy, so no concept of creating a new connection. So what am I
missing? Is
the httpClient object still usable after a ConnectionClosedException?



The internal connection pool can automatically re-establish closed
connection once the connection termination handshake completes or the
connection gets dropped abnormally.



Somewhat related, I am looking at the sample here:


https://hc.apache.org/httpcomponents-client-5.3.x/migration-guide/migration-to-async-http2.html

What is the difference between the two socket timeout configurations,
on
IOReactorConfig and ConnectionConfig?


Both represent the same timeout but apply at different levels.
IOReactorConfig apply at the i/o reactor level and are specific to the
async i/o model. ConnectionConfig apply at the connection management
level and is not specific to any i/o model.



What is the time to live?


You mean TTL, total time to live? The maximum period of time
connections can be kept alive and re-used. Once past TTL connections
get automatically closed out.

Hope this helps

Oleg

-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org







Connection estabilishment metrics

2024-03-15 Thread Richard Tippl
Hello,

I am supporting a Spring Boot application, which uses HttpClient 5 in the
background. We're mainly using PoolingHttpClientConnectionManager to send a
large amount of requests to a target server.

We're experiencing some network issues (socket connection timeouts during
high load scenarios) and in trying to locate them, I've begun the process
of trying to look into what actually happens during connection
establishment.
My idea was to measure the time it takes for certain steps taken when
creating a connection. Mainly I wanted to measure TCP socket open and SSL
handshake.

The initial version I've come up with uses (abuses) the
ConnectionSocketFactory interface, wrapping it in a way to measure the
length of execution for connectSocket. This gives the sum of TCP open and
SSL handshake.
This way I can at least get some numbers and use them to help with locating
and resolving the issues.

There are 2 issues with this approach, as far as i can tell, I can't
measure these times separately, and in the newest alpha version of 5.4 the
interface I'm using has been deprecated and replaced by the
DefaultHttpClientConnectionOperator, which performs all of the connection
steps in a single method call.

Am I missing some easier way to plug into the flow of creating a connection
and getting the ability to measure what I wish to measure? Will it still be
possible after the deprecated interfaces get removed? Is there a way I
could measure both socket open and SSL handshake separately?
The metrics I've achieved so far already started showing us certain trends
and extending them could help us more in trying to solve these issues.

Thanks for responding.

Richard


Re: Connection estabilishment metrics

2024-03-15 Thread Skylos
I do have a thought - sometimes its just important to make it work, not to
drill into perfectly what it is - I have had some luck with the
Resilience4J library, where I set it up on a short timeout - connections
that work work within, say, 30ms - and if they take more than 50ms, they're
never coming back on my measurements. So - I set the timeout to 50ms, the
delay to 10ms, and the retry count to 3.  It can make noise every time it
retries - but thanks to the squirrely behavior I was seeing under load,
which seemed more infrastructure than server based, the retry solved the
problem neatly.

The similar sounding problem I had was spates of rest calls between
services at different cloud providers going through different load
balance/api management system layers.  It seemed to me that a low
percentage of my attempts to connect would just disappear - like a packet
loss or a process crash on the load balance - remote service would never
see a packet.

Just a thought, hope it helps.  I love to aggressively chase solutions.

David



On Fri, Mar 15, 2024 at 4:58 PM Richard Tippl 
wrote:

> Hello,
>
> I am supporting a Spring Boot application, which uses HttpClient 5 in the
> background. We're mainly using PoolingHttpClientConnectionManager to send a
> large amount of requests to a target server.
>
> We're experiencing some network issues (socket connection timeouts during
> high load scenarios) and in trying to locate them, I've begun the process
> of trying to look into what actually happens during connection
> establishment.
> My idea was to measure the time it takes for certain steps taken when
> creating a connection. Mainly I wanted to measure TCP socket open and SSL
> handshake.
>
> The initial version I've come up with uses (abuses) the
> ConnectionSocketFactory interface, wrapping it in a way to measure the
> length of execution for connectSocket. This gives the sum of TCP open and
> SSL handshake.
> This way I can at least get some numbers and use them to help with locating
> and resolving the issues.
>
> There are 2 issues with this approach, as far as i can tell, I can't
> measure these times separately, and in the newest alpha version of 5.4 the
> interface I'm using has been deprecated and replaced by the
> DefaultHttpClientConnectionOperator, which performs all of the connection
> steps in a single method call.
>
> Am I missing some easier way to plug into the flow of creating a connection
> and getting the ability to measure what I wish to measure? Will it still be
> possible after the deprecated interfaces get removed? Is there a way I
> could measure both socket open and SSL handshake separately?
> The metrics I've achieved so far already started showing us certain trends
> and extending them could help us more in trying to solve these issues.
>
> Thanks for responding.
>
> Richard
>


-- 
Dog approved this message.


Re: Connection estabilishment metrics

2024-03-15 Thread Richard Tippl
The interesting part of my case is that the connection timeout is set to 5
seconds,
and while it times out sometimes, the above-mentioned metrics I've already
created reveal that some connections start taking 1-3 seconds to actually
fully establish.

This would indicate some low level OS/networking timeout might have been
reached and a retry/resend happened automatically.
I could certainly lower the timeout to something way more reasonable and
implement a retry mechanism to try again a few more times,
but at this point of troubleshooting, it seems like trying to mask the
underlying issue without solving it, which could bite us later on.

As such, for now I would love to actually find the issue (it's a fresh new
piece of infrastructure we're migrating to) but even limiting any dumps
severely, we're looking at large amounts of packets per minute.
Having metrics for each step of the handshake could help reveal
misconfigurations we've performed while setting it all up.

Thanks for the mention regardless, we're already using Resilience4j for its
circuit breaking capabilities, so if all attempts fail, the retry is
certainly an option.

Richard


On Fri, Mar 15, 2024 at 10:05 PM Skylos  wrote:

> I do have a thought - sometimes its just important to make it work, not to
> drill into perfectly what it is - I have had some luck with the
> Resilience4J library, where I set it up on a short timeout - connections
> that work work within, say, 30ms - and if they take more than 50ms, they're
> never coming back on my measurements. So - I set the timeout to 50ms, the
> delay to 10ms, and the retry count to 3.  It can make noise every time it
> retries - but thanks to the squirrely behavior I was seeing under load,
> which seemed more infrastructure than server based, the retry solved the
> problem neatly.
>
> The similar sounding problem I had was spates of rest calls between
> services at different cloud providers going through different load
> balance/api management system layers.  It seemed to me that a low
> percentage of my attempts to connect would just disappear - like a packet
> loss or a process crash on the load balance - remote service would never
> see a packet.
>
> Just a thought, hope it helps.  I love to aggressively chase solutions.
>
> David
>
>
>
> On Fri, Mar 15, 2024 at 4:58 PM Richard Tippl 
> wrote:
>
> > Hello,
> >
> > I am supporting a Spring Boot application, which uses HttpClient 5 in the
> > background. We're mainly using PoolingHttpClientConnectionManager to
> send a
> > large amount of requests to a target server.
> >
> > We're experiencing some network issues (socket connection timeouts during
> > high load scenarios) and in trying to locate them, I've begun the process
> > of trying to look into what actually happens during connection
> > establishment.
> > My idea was to measure the time it takes for certain steps taken when
> > creating a connection. Mainly I wanted to measure TCP socket open and SSL
> > handshake.
> >
> > The initial version I've come up with uses (abuses) the
> > ConnectionSocketFactory interface, wrapping it in a way to measure the
> > length of execution for connectSocket. This gives the sum of TCP open and
> > SSL handshake.
> > This way I can at least get some numbers and use them to help with
> locating
> > and resolving the issues.
> >
> > There are 2 issues with this approach, as far as i can tell, I can't
> > measure these times separately, and in the newest alpha version of 5.4
> the
> > interface I'm using has been deprecated and replaced by the
> > DefaultHttpClientConnectionOperator, which performs all of the connection
> > steps in a single method call.
> >
> > Am I missing some easier way to plug into the flow of creating a
> connection
> > and getting the ability to measure what I wish to measure? Will it still
> be
> > possible after the deprecated interfaces get removed? Is there a way I
> > could measure both socket open and SSL handshake separately?
> > The metrics I've achieved so far already started showing us certain
> trends
> > and extending them could help us more in trying to solve these issues.
> >
> > Thanks for responding.
> >
> > Richard
> >
>
>
> --
> Dog approved this message.
>