HttpClient with a web service: Consistent data center switch-over

Aaron Curley Mon, 02 May 2016 21:29:34 -0700

Hello all,

I'm using HttpClient version 4.5.1 to make high-volume calls to an internal highly-available 
web service that is hosted at multiple geographic locations.  On average, the web service 
calls need to be executed rather quickly; therefore, we are presently making heavy use of 
HttpClient's connection pooling & reuse behavior (via PoolingHttpClientConnectionManager) 
to alleviate the overhead (TCP and TLS) incurred when setting up a new connection.  Our DNS 
records direct clients of the service to the (geographically) nearest data center. Should a 
data center experience problems, we update our DNS records to redirect our clients to our 
remaining "good" site(s).

In times of an outage, HttpClient appears to have automatically failed-over to
our alternate sites (IPs) pretty timely and consistently; however, upon service
restoration at the primary data-center for a particular client node, HttpClient
does not appear to reliably switch back to calling our primary site. (In such
instances, I have been able to confirm that our DNS records ARE switching back
in a timely manner. The TTLs are set to an appropriately short value.)

My best guess is that the lack of consistent "switch back" to the primary site is caused by
HttpClient's connection pooling & reuse behavior. Because of our relatively high (and consistent) request
volume, creation of NEW connections is likely a rarity once a particular client node is operating for a while;
instead, as previously noted, I would generally expect that connections in the pool be re-used whenever possible.
Any re-used connections will still be established with the alternate site(s), therefore, the client nodes
communicating with alternate sites would generally never (or only VERY gradually) switch back to communicating
with the primary site. This lines up with what I have observed; "switching back" seems to only happen
once request throughput drops sufficiently to allow most of our client nodes' pooled connections to "time
out" and be closed due to inactivity (e.g. during overnight hours).

I believe a reasonably "standard" way to solve this problem would be to configure a
maximum 'lifetime' of a connection in the connection pool (e.g. 1 hour). This lifetime would be
enforced regardless of whether or not the connection is idle or can otherwise be re-used. On first
glance, the HttpClientBuilder.setConnectionTimeToLive() method seemed ideal for this, but upon
further investigation and review of the HttpClient code base, this method appears to configure the
maximum TTL without introducing any element of randomness into each connection's TTL. As a result,
I'm concerned that if I enable the built-in TTL feature, my clients are likely to experience
regular performance "spikes" at the configured TTL interval. (This would be caused when
most/all of the pooled connections expire simultaneously, since they were mostly all created at
once, at application start-up.)

Instead of using the built-in TTL limit setting, I considered overriding the time to live
using the available callbacks (ConnectionKeepAliveStrategy, ConnectionReuseStrategy).
Unfortunately, this approach appears to be infeasible because the parameters to those
callbacks do not have access to the underlying connection object; there is no
"key" that can be used to look up a connection's lifetime (i.e. in a
ConcurrentHashMap) such that a decision could be made about whether to close or retain
the connection. I also took a look at the various places that I could try to override
the default connection pooling behavior (e.g. MainClientExec,
HttpClientConnectionManager). HttpClientConnectionManager appears to be the best bet
(specifically, HttpClientConnectionManager.releaseConnection() would have to be
overridden), but this would require duplication of the existing releaseConnection() code
with slight modifications in the overriding class. This seems brittle.

Can anyone think of a way that I could implement this (cleanly) with HttpClient
4.x? Maybe I missed something?

If not, I would be happy to open a JIRA for a possible HttpClient enhancement to enable such a
feature. If people are open to this idea, I was generally thinking that adding a more generic
callback might be the best approach (since my use case may not be other people's use cases), but I
could also be convinced to make the enhancement request specifically support a connection
expiration "window" for the TTL feature instead of a specific "hard-limit".
Any thoughts on this?

Thanks in advance (and sorry for the long email)!

Regards,
Aaron Curley
accw...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

HttpClient with a web service: Consistent data center switch-over

Reply via email to