Re: [BUG?] git http connection reuse

2014-02-18 Thread Daniel Stenberg

On Tue, 18 Feb 2014, Jeff King wrote:

I think there is still an unrelated issue with curl_multi preventing 
connection reuse, but I'm not sure from what you say below...


I'm not clear whether you mean by this that it is _expected_ in my test 
program for curl not to reuse the connection. Or that curl may simply have 
to do a little more work, and it is still a bug that the connection is not 
reused.


Argh, sorry. I thought were still referring to the previous problem. I can 
indeed repeat the problem you talk about with your test code. Thanks! I'll get 
back to you.


--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-18 Thread Daniel Stenberg

On Tue, 18 Feb 2014, Jeff King wrote:

I'm not clear whether you mean by this that it is _expected_ in my test 
program for curl not to reuse the connection. Or that curl may simply have 
to do a little more work, and it is still a bug that the connection is not 
reused.


Okey, I checked this closer now and this is the full explanation to what 
happens. It seems to work as intended:


It's all about where the connection cache is held by libcurl. When you create 
a multi handle, it will create a connection cache that will automatically be 
shared by all easy handles that are added to it.


If you create an easy handle and make a curl_easy_perform() on that, it will 
create its own connection cache and keep it associated with this easy handle.


When first using an easy handle within a multi handle it will use the shared 
connection cache in there as long as it is in that multi handle family, but as 
soon as you remove it from there it will be detached from that connection 
cache.


Then, when doing a fresh request with easy_perform using the handle that was 
detached from the multi handle, it will create and use its own private cache 
as it can't re-use the previous connection that is cached within the multi 
handle.


We can certainly teach git to use the multi interface, even when doing a 
single blocking request.


For connection re-use purposes, that may make a lot of sense.

--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-17 Thread Jeff King
On Sun, Feb 16, 2014 at 02:33:06PM +0100, Daniel Stenberg wrote:

 On Sun, 16 Feb 2014, Daniel Stenberg wrote:
 
 Right, the problem is there to make sure that a NTLM-auth
 connection with different credentials aren't re-used. NTLM with its
 connection-oriented authentication breaks the traditional HTTP
 paradigms and before this change there was a risk that libcurl
 would wrongly re-use a NTLM connection that was done with different
 credentials!
 
 I suspect we introduced a regression here with that fix. I'll dig into this.
 
 Thanks for pointing this out!
 
 I could repeat this problem with a curl test case so I wrote up two,
 and then I made a fix to curl:
 
   https://github.com/bagder/curl/commit/d765099813f5

Thanks for the quick turnaround. I've confirmed that your patch fixes
both my limited test case and Git itself (when using just the curl_easy
interface).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-17 Thread Jeff King
On Sun, Feb 16, 2014 at 01:18:52PM +0100, Daniel Stenberg wrote:

 On Sat, 15 Feb 2014, Kyle J. McKay wrote:
 
 If pipelining is off (the default) and total connections is not 1
 it sounds to me from the description above that the requests will
 be executed on separate connections until the maximum number of
 connections is in use and then there might be some reuse.
 
 Not exactly. When about to do a request, libcurl will always try to
 find an existing idle but open connection in its connection pool to
 re-use. With the multi interface you can of course easily start N
 requests at once to the same host and then they'll only re-use
 connections to the extent there are connections to pick, otherwise
 it'll create new connections.

Right; I'd expect multiple connections for parallel requests, but in
this case we are completing the first and removing the handle before
starting the second. Digging further, I was able to reproduce the
behavior with a simple program:

-- 8 --
#include stdio.h
#include stdlib.h
#include curl/curl.h

int main(int argc, char **argv)
{
CURL *curl = curl_easy_init();
CURLM *multi = curl_multi_init();
int want_multi = atoi(argv[2]);
int i;

curl_easy_setopt(curl, CURLOPT_URL, argv[1]);
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);

for (i = 0; i  2; i++) {
if (i  want_multi) {
int nr;

curl_multi_add_handle(multi, curl);
do {
curl_multi_perform(multi, nr);
} while (nr);
curl_multi_remove_handle(multi, curl);
}
else
curl_easy_perform(curl);
}

return 0;
}
-- 8 --

The program just requests the same URL twice using the same curl handle,
optionally using the multi interface for zero, one, or both of the
requests. For 0 and 2 (i.e., either both curl_easy or both
curl_multi), the connection is reused. But for 1 (in which we use
multi for the first request, but easy for the second), we do not reuse
the connection.

The manpage for curl_multi_add_handle does say:

  When an easy handle has been added to a multi stack, you can not
  and you must not use curl_easy_perform(3) on that handle!

Does that apply to the handle after it has finished its transaction and
been removed from the multi object (in which case git is Doing It
Wrong)? Is the program above failing to take some step to finalize the
first request, so that the second one knows its connection can be
reused? Or should curl be handling this?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-17 Thread Daniel Stenberg

On Mon, 17 Feb 2014, Jeff King wrote:

Right; I'd expect multiple connections for parallel requests, but in this 
case we are completing the first and removing the handle before starting the 
second. Digging further, I was able to reproduce the behavior with a simple 
program:


Yeah, given your description I had no problems to repeat it either. Turns out 
we had no decent test case that checked for this so in our eagerness to fix a 
security problem involving over-reuse we broke this simpler reuse case. Two 
steps forward, one step backward... :-/



The manpage for curl_multi_add_handle does say:

 When an easy handle has been added to a multi stack, you can not
 and you must not use curl_easy_perform(3) on that handle!

Does that apply to the handle after it has finished its transaction and been 
removed from the multi object (in which case git is Doing It Wrong)?


No it doesn't. The man page should probably be clarified to express that 
slightly better. It just means that _while_ a handle is added to a multi 
handle it cannot be used with curl_easy_perform().


So yes, you can remove indeed it from the handle and then do 
curl_easy_perform(). It works just fine!


Several internal caches are kept in the multi handle when that's used though, 
so when getting the easy handle out after having used it with the multi 
interface and then using it with the easy interface may cause libcurl to do a 
little more work than if you would be able to re-add it to the same multi 
handle do to the operation with that...


--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-17 Thread Jeff King
On Tue, Feb 18, 2014 at 08:13:16AM +0100, Daniel Stenberg wrote:

 Right; I'd expect multiple connections for parallel requests, but
 in this case we are completing the first and removing the handle
 before starting the second. Digging further, I was able to
 reproduce the behavior with a simple program:
 
 Yeah, given your description I had no problems to repeat it either.
 Turns out we had no decent test case that checked for this so in our
 eagerness to fix a security problem involving over-reuse we broke
 this simpler reuse case. Two steps forward, one step backward... :-/

You are talking about the NTLM fix here, right?

I think there is still an unrelated issue with curl_multi preventing
connection reuse, but I'm not sure from what you say below...

 Does that apply to the handle after it has finished its transaction
 and been removed from the multi object (in which case git is Doing
 It Wrong)?
 
 No it doesn't. The man page should probably be clarified to express
 that slightly better. It just means that _while_ a handle is added to
 a multi handle it cannot be used with curl_easy_perform().

Thanks for the clarification. That was my guess from reading it, but
given that it wasn't working as expected, I wondered if we were
violating the rules. I saw the documentation fix you just pushed; it
looks reasonable to me.

 Several internal caches are kept in the multi handle when that's used
 though, so when getting the easy handle out after having used it with
 the multi interface and then using it with the easy interface may
 cause libcurl to do a little more work than if you would be able to
 re-add it to the same multi handle do to the operation with that...

I'm not clear whether you mean by this that it is _expected_ in my test
program for curl not to reuse the connection. Or that curl may simply
have to do a little more work, and it is still a bug that the connection
is not reused.

We can certainly teach git to use the multi interface, even when doing a
single blocking request.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-16 Thread Daniel Stenberg

On Sat, 15 Feb 2014, Kyle J. McKay wrote:

If pipelining is off (the default) and total connections is not 1 it sounds 
to me from the description above that the requests will be executed on 
separate connections until the maximum number of connections is in use and 
then there might be some reuse.


Not exactly. When about to do a request, libcurl will always try to find an 
existing idle but open connection in its connection pool to re-use. With the 
multi interface you can of course easily start N requests at once to the same 
host and then they'll only re-use connections to the extent there are 
connections to pick, otherwise it'll create new connections.



Daniel Stenberg (7 Jan 2014)
- ConnectionExists: fix NTLM check for new connection


Looks like you're just lucky as that above change first appears in 7.35.0. 
But it seems there are some patches for older versions so they might be 
affected as well [2].


Right, the problem is there to make sure that a NTLM-auth connection with 
different credentials aren't re-used. NTLM with its connection-oriented 
authentication breaks the traditional HTTP paradigms and before this change 
there was a risk that libcurl would wrongly re-use a NTLM connection that was 
done with different credentials!


I suspect we introduced a regression here with that fix. I'll dig into this.

--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-16 Thread Daniel Stenberg

On Sun, 16 Feb 2014, Daniel Stenberg wrote:

Right, the problem is there to make sure that a NTLM-auth connection with 
different credentials aren't re-used. NTLM with its connection-oriented 
authentication breaks the traditional HTTP paradigms and before this change 
there was a risk that libcurl would wrongly re-use a NTLM connection that 
was done with different credentials!


I suspect we introduced a regression here with that fix. I'll dig into this.


Thanks for pointing this out!

I could repeat this problem with a curl test case so I wrote up two, and then 
I made a fix to curl:


  https://github.com/bagder/curl/commit/d765099813f5

--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] git http connection reuse

2014-02-15 Thread Kyle J. McKay

On Feb 15, 2014, at 20:05, Jeff King wrote:
I've noticed that git does not reuse http connections when fetching,  
and

I'm trying to figure out why. It seems related to our use of two curl
features:

 1. If we use the curl_multi interface (even though we are doing the
requests sequentially), we do not get reuse.


Take a look at the curl_multi_setopt page [1].  I think these explain  
the curl_multi issue:



CURLMOPT_PIPELINING

Pass a long set to 1 to enable or 0 to disable. Enabling pipelining  
on a multi handle will make it attempt to perform HTTP Pipelining as  
far as possible for transfers using this handle. This means that if  
you add a second request that can use an already existing  
connection, the second request will be piped on the same  
connection rather than being executed in parallel. (Added in 7.16.0)



CURLMOPT_MAX_TOTAL_CONNECTIONS

Pass a long. The set number will be used as the maximum amount of  
simultaneously open connections in total. For each new session,  
libcurl will open a new connection up to the limit set by  
CURLMOPT_MAX_TOTAL_CONNECTIONS. When the limit is reached, the  
sessions will be pending until there are available connections. If  
CURLMOPT_PIPELINING is 1, libcurl will try to pipeline if the host  
is capable of it.


The default value is 0, which means that there is no limit. However,  
for backwards compatibility, setting it to 0 when  
CURLMOPT_PIPELINING is 1 will not be treated as unlimited. Instead  
it will open only 1 connection and try to pipeline on it.


(Added in 7.30.0)


If pipelining is off (the default) and total connections is not 1 it  
sounds to me from the description above that the requests will be  
executed on separate connections until the maximum number of  
connections is in use and then there might be some reuse.  That might  
not be what's actually going on with multi, but that's my guess and I  
think setting CURLMOPT_PIPELINING to to 1 and then also setting  
CURLMOPT_MAX_TOTAL_CONNECTIONS to a non-zero value might be what you  
want.



 2. If we set CURLOPT_HTTPAUTH to CURLAUTH_ANY, we do not get reuse.

[snip]

My curl version is 7.35.0, if that makes a difference.


And that is explained by this from the CHANGES file:


Daniel Stenberg (7 Jan 2014)
- ConnectionExists: fix NTLM check for new connection

  When the requested authentication bitmask includes NTLM, we cannot
  re-use a connection for another username/password as we then risk
  re-using NTLM (connection-based auth).

  This has the unfortunate downside that if you include NTLM as a  
possible
  auth, you cannot re-use connections for other usernames/passwords  
even

  if NTLM doesn't end up the auth type used.

  Reported-by: Paras S
  Patched-by: Paras S
  Bug: http://curl.haxx.se/mail/lib-2014-01/0046.html


Looks like you're just lucky as that above change first appears in  
7.35.0.  But it seems there are some patches for older versions so  
they might be affected as well [2].


Adjusting your test program to leave the first connection set to  
CURLAUTH_ANY, but then setting the second one to CURLAUTH_ANY with the  
two NTLM bits turned off (CURLAUTH_NTLM and CURLAUTH_NTLM_WB) results  
in the connection being reused for me. :)


With the older cURL versions I already had installed, the second  
connection is always reused.  I didn't see the non-reuse with your  
sample code until I fetched 7.35.0.


--Kyle

[1] http://curl.haxx.se/libcurl/c/curl_multi_setopt.html
[2] http://curl.haxx.se/docs/adv_20140129.html

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html