[ 
https://issues.apache.org/jira/browse/THRIFT-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013952#comment-15013952
 ] 

Justin Larrabee commented on THRIFT-3429:
-----------------------------------------

Again, you don't get to re-use the same connection within each goroutine -- 
that is not how the Go stdlib http package works. The http.Client instance all 
those goroutines share controls all the keep-alive connections depending upon 
how large MaxIdleConnsPerHost is AND if you properly read and close a response 
body for a response. Every time you go client.Do the http.Client itself looks 
to see if it has a keepalive connection in its pool of connections, and if it 
does you get to use it, otherwise you make a new connection.

Let's assume you use the http.DefaultClient (this is the default now in 
THttpTransport) in your example. It has 2 keep alive connections per host. You 
spin up 1000 THttpClients and they each call Flush once and then they sit 
around and do not have Close() called. Currently the way the THttpClient works, 
if you have your client set for keepalives that connection will be held on to 
forever until it is closed by the remote host or released in a call to Flush or 
Close. If you spin up another 1000 THttpClients you have another 1000 
connections. Now let's say 500 of those THttpClients have Flush called. The 
cached response body is read and closed on each THttpClient and when that 
happens the connection is returned to the http.Client's connection pool. 
Because MaxIdleConnsPerHost is set to 2, in the pessimistic case only TWO of 
those 500 connections will be kept and re-used despite the fact that you marked 
all 500 for keepalive. So what happens next is that 2 random goroutines of 
those 500 making a new request will get the keep alive connections and the 
other 498 get to make new ones.

Now depending on the order that the goroutines end up being executed, it is 
possible that more than two will be kept if the connections are returned and 
re-used in stutter step. In practice you cannot rely on this since the 
scheduler is free to do whatever it wants.

My suggestion in this issue would change the behavior of the example I quoted 
by returning the HTTP connection to the pool immediately -- meaning all the 
other goroutines have a much higher chance of getting an existing keep-alive 
and not spinning up 998 disposable connections. Combined with a larger 
MaxIdleConnsPerHost setting in the http.Client, you make much, much better use 
of your resources and do not suffer the immense overhead of setting up those 
connections every single request.


> Go THttpClient option to fully read/close HTTP response body after Flush
> ------------------------------------------------------------------------
>
>                 Key: THRIFT-3429
>                 URL: https://issues.apache.org/jira/browse/THRIFT-3429
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Go - Library
>            Reporter: Justin Larrabee
>            Priority: Minor
>
> Currently the THttpTransport holds onto the HTTP response body until the 
> owner calls Close(). When using keepalive this has the side effect of not 
> releasing the TCP connection back to the HTTP client's pool until it is 
> called. When creating many concurrent THttpTransport's this delay can cause a 
> lot of additional connections to be created.
> I would suggest that the transport be configurable to fully read and close 
> the HTTP response body after a Flush() call so that the connection is 
> returned to the pool as quickly as possible.
> I would love some opinions on this suggestion. My patch for issue 3405 added 
> a THttpClientOptions struct to make it easy to support additional features 
> like this improvement without needing to alter the new public constructors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to