Re: HttpClient with a web service: Consistent data center switch-over

Oleg Kalnichevski Fri, 06 May 2016 07:46:56 -0700

On Fri, 2016-05-06 at 07:17 -0700, Aaron Curley wrote:
> On 5/6/2016 5:15 AM, Oleg Kalnichevski wrote:
> > On Thu, 2016-05-05 at 07:12 -0700, Aaron Curley wrote:
> >> On 5/3/2016 9:49 AM, Oleg Kalnichevski wrote:
> >>> On Tue, 2016-05-03 at 09:15 -0700, Aaron Curley wrote:
> >>>> On 5/3/2016 8:22 AM, Oleg Kalnichevski wrote:
> >>>>> On Mon, 2016-05-02 at 21:28 -0700, Aaron Curley wrote:
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I'm using HttpClient version 4.5.1 to make high-volume calls to an 
> >>>>>> internal highly-available web service that is hosted at multiple 
> >>>>>> geographic locations.  On average, the web service calls need to be 
> >>>>>> executed rather quickly; therefore, we are presently making heavy use 
> >>>>>> of HttpClient's connection pooling & reuse behavior (via 
> >>>>>> PoolingHttpClientConnectionManager) to alleviate the overhead (TCP and 
> >>>>>> TLS) incurred when setting up a new connection.  Our DNS records 
> >>>>>> direct clients of the service to the (geographically) nearest data 
> >>>>>> center. Should a data center experience problems, we update our DNS 
> >>>>>> records to redirect our clients to our remaining "good" site(s).
> >>>>>>
> >>>>>> In times of an outage, HttpClient appears to have automatically 
> >>>>>> failed-over to our alternate sites (IPs) pretty timely and 
> >>>>>> consistently; however, upon service restoration at the primary 
> >>>>>> data-center for a particular client node, HttpClient does not appear 
> >>>>>> to reliably switch back to calling our primary site.  (In such 
> >>>>>> instances, I have been able to confirm that our DNS records ARE 
> >>>>>> switching back in a timely manner.  The TTLs are set to an 
> >>>>>> appropriately short value.)
> >>>>>>
> >>>>>> My best guess is that the lack of consistent "switch back" to the 
> >>>>>> primary site is caused by HttpClient's connection pooling & reuse 
> >>>>>> behavior.  Because of our relatively high (and consistent) request 
> >>>>>> volume, creation of NEW connections is likely a rarity once a 
> >>>>>> particular client node is operating for a while; instead, as 
> >>>>>> previously noted, I would generally expect that connections in the 
> >>>>>> pool be re-used whenever possible.  Any re-used connections will still 
> >>>>>> be established with the alternate site(s), therefore, the client nodes 
> >>>>>> communicating with alternate sites would generally never (or only VERY 
> >>>>>> gradually) switch back to communicating with the primary site.  This 
> >>>>>> lines up with what I have observed; "switching back" seems to only 
> >>>>>> happen once request throughput drops sufficiently to allow most of our 
> >>>>>> client nodes' pooled connections to "time out" and be closed due to 
> >>>>>> inactivity (e.g. during overnight hours).
> >>>>>>
> >>>>>>
> >>>>>> I believe a reasonably "standard" way to solve this problem would be 
> >>>>>> to configure a maximum 'lifetime' of a connection in the connection 
> >>>>>> pool (e.g. 1 hour).  This lifetime would be enforced regardless of 
> >>>>>> whether or not the connection is idle or can otherwise be re-used.  On 
> >>>>>> first glance, the HttpClientBuilder.setConnectionTimeToLive() method 
> >>>>>> seemed ideal for this, but upon further investigation and review of 
> >>>>>> the HttpClient code base, this method appears to configure the maximum 
> >>>>>> TTL without introducing any element of randomness into each 
> >>>>>> connection's TTL.  As a result, I'm concerned that if I enable the 
> >>>>>> built-in TTL feature, my clients are likely to experience regular 
> >>>>>> performance "spikes" at the configured TTL interval.  (This would be 
> >>>>>> caused when most/all of the pooled connections expire simultaneously, 
> >>>>>> since they were mostly all created at once, at application start-up.)
> >>>>>>
> >>>>>>
> >>>>>> Instead of using the built-in TTL limit setting, I considered 
> >>>>>> overriding the time to live using the available callbacks 
> >>>>>> (ConnectionKeepAliveStrategy, ConnectionReuseStrategy).  
> >>>>>> Unfortunately, this approach appears to be infeasible because the 
> >>>>>> parameters to those callbacks do not have access to the underlying 
> >>>>>> connection object; there is no "key" that can be used to look up a 
> >>>>>> connection's lifetime (i.e. in a ConcurrentHashMap) such that a 
> >>>>>> decision could be made about whether to close or retain the 
> >>>>>> connection.  I also took a look at the various places that I could try 
> >>>>>> to override the default connection pooling behavior (e.g. 
> >>>>>> MainClientExec, HttpClientConnectionManager). 
> >>>>>> HttpClientConnectionManager appears to be the best bet (specifically, 
> >>>>>> HttpClientConnectionManager.releaseConnection() would have to be 
> >>>>>> overridden), but this would require duplication of the existing 
> >>>>>> releaseConnection() code with slight modifications in the overriding 
> >>>>>> class. This seems brittle.
> >>>>>>
> >>>>>>
> >>>>>> Can anyone think of a way that I could implement this (cleanly) with 
> >>>>>> HttpClient 4.x?  Maybe I missed something?
> >>>>>>
> >>>>>> If not, I would be happy to open a JIRA for a possible HttpClient 
> >>>>>> enhancement to enable such a feature.  If people are open to this 
> >>>>>> idea, I was generally thinking that adding a more generic callback 
> >>>>>> might be the best approach (since my use case may not be other 
> >>>>>> people's use cases), but I could also be convinced to make the 
> >>>>>> enhancement request specifically support a connection expiration 
> >>>>>> "window" for the TTL feature instead of a specific "hard-limit".  Any 
> >>>>>> thoughts on this?
> >>>>>>
> >>>>>>
> >>>>>> Thanks in advance (and sorry for the long email)!
> >>>>>>
> >>>>> Hi Aaron
> >>>>>
> >>>>> PoolingHttpClientConnectionManager does not pro-actively evict expired
> >>>>> connections by default. I think it is unlikely that connections with a
> >>>>> finite TTL would all get evicted at the same time. Having said that you
> >>>>> are welcome to contribute a patch enabling TTL setting on a per pool
> >>>>> entry basis.
> >>>>>
> >>>>> Oleg
> >>>>>
> >>>> Hi Oleg,
> >>>>
> >>>> Thanks for your response.  I would be happy to submit a patch, but
> >>>> before doing so, I'd want to ensure my concerns about the current TTL
> >>>> implementation are actually real.
> >>>>
> >>>> Due to the lack of background threads, the connection pool would only
> >>>> remove an expired connection synchronously (i.e. after a request using
> >>>> that connection was completed), right?
> >>> Not quite. Connection validity is checked upon connection lease, not
> >>> upon connection release.
> >>>
> >>>>    When you say
> >>>> "PoolingHttpClientConnectionManager does not pro-actively evict expired
> >>>> connections ... ", are you referring to this synchronous eviction
> >>>> behavior or something else?
> >>>>
> >>> Yes.
> >>>
> >>>> If the former, my concern is that our clients perform multiple
> >>>> transactions per second.  If a connection pool has, say, 50 connections
> >>>> (that were mostly created very closely to each other, due to a high
> >>>> throughput at startup), and my client is executing 25 requests/sec, then
> >>>> theoretically, once the TTL expires, all connections in the pool would
> >>>> likely have been used within a few seconds (and thus closed as
> >>>> expired).  Am I off-base?
> >>>>
> >>> No, I think you are not. However for such a condition to occur one needs
> >>> to completely exhaust the entire pool of connection literally
> >>> instantaneously, which is not very likely (*), because the pool manager
> >>> always tries to reuse newest connections first
> >>>
> >>> (*) The only problem might be slow setup of TLS encrypted connections.
> >>>
> >>> Oleg
> >>>
> >>>> Regards,
> >>>>
> >>>> Aaron Curley
> >>>>
> >>>>
> >> Hi Oleg,
> >>
> >> Thanks for the clarifications.  I am indeed using TLS which is (partly)
> >> why I'm so concerned about the new connection set-up time.
> >>
> >> Because of my rather high number of operations per second, I'm still a
> >> bit worried about exhausting a large percentage of connections in the
> >> pool over a short period of time.  I will see if I can do some testing
> >> to see if this is a problem in reality.
> >>
> >> Planning ahead, if I was to submit a patch, is the HttpClient 4.x branch
> >> still open for enhancements (or is it mainly just accepting bug fixes at
> >> this point)?
> >>
> > Hi Aaron
> >
> > The 4.5.x branch should be used for bug fix releases only. We ought not
> > add new features to it. Theoretically we could could create 4.6.x branch
> > but I would very much prefer not to. There will be no problem adding
> > this feature to 5.0 (trunk).
> >
> > Oleg
> 
> 
> Hi Oleg,
> 
> > The 4.5.x branch should be used for bug fix releases only. We ought not
> > add new features to it.
> Yep! This makes perfect sense.
> > Theoretically we could could create 4.6.x branch
> > but I would very much prefer not to. There will be no problem adding
> > this feature to 5.0 (trunk).
> Acknowledged. From my perspective, I hesitate to make a migration to 5.x a 
> prerequisite for this. But, right now, I don't think it would be reasonable 
> for me to push for a 4.6.x branch, if for no other reason, because I need to 
> do my homework. :-)
> 
> As soon as time permits, I'll set up a test to see if a simultaneous 
> connection closure (due to TTL expiry) can take place under high throughput. 
> I'm a bit maxed out right now, so I probably won't have any results for a 
> good while. If this turns out to indeed be a problem, at that time, perhaps 
> we can re-evaluate where a possible enhancement would go (based on the 
> current status of the 5.x release)?
> 
> 
> Also, is there any (estimated) timeline for the 5.x release, or is it too 
> early to know at this point?
>


I am working on HTTP/2 transport implementation in the trunk. So far the
progress has been painful and slow (almost GRRM slow). I can hardly
imagine a BETA release sooner than Q1/2017 and that is probably being
optimistic.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

Re: HttpClient with a web service: Consistent data center switch-over

Reply via email to