On Thu, Apr 14, 2016 at 05:32:16PM -0400, Isaac Levy wrote:

> After the authenticated request, curl says it's keeping the connection
> open, but the next fetch seems to do two handshakes again.  The
> unauthenticated request closes the connection, so the 2nd handshake is
> forced, but I'm not sure why subsequent git fetches still do
> handshakes.  I did a bit of sleuthing w/ source, GIT_CURL_VERBOSE and
> wireshark.
> 
> We're using GitHub enterprise -- it'd just be nice if there was a
> better way to configure for super fast fetches.  ssh with cached
> connections does avoid the SSL this overhead -- though as I recall git
> protocols over ssh are less performant.

It's the opposite; generally git is more efficient over ssh, because we
have a true full-duplex connection, and we can do everything over a
single session (though setup for the ssh session may or may not be
faster than SSL).

Here's what I observed just for the initial contact. If I instrument git
like this:

diff --git a/http.c b/http.c
index 4304b80..9bedad7 100644
--- a/http.c
+++ b/http.c
@@ -1422,6 +1422,7 @@ static int http_request(const char *url,
        curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
        curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip");
 
+       warning("running one curl slot...");
        ret = run_one_slot(slot, &results);
 
        if (options && options->content_type) {

and run:

  GIT_CURL_VERBOSE=1 git ls-remote 2>&1 | egrep '< HTTP|> |^Auth|^warn'

I get three requests:

  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  < HTTP/1.1 401 Authorization Required
  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  < HTTP/1.1 401 Authorization Required
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  Authorization: Basic ...
  < HTTP/1.1 200 OK

The first request is us (git) asking curl to hit the URL, and curl
returns the 401 from the server to us. At that point we'll prompt the
user (or the credential helper) for the password, and then we'll give
that to curl and ask it to make the request again. Curl will do the
probe with the 401, because we haven't set an auth type, and then follow
up (without returning the 401 to git) with the right credentials.

I think we can take that down to _two_ requests pretty easily. We know
in the very first request that the server told us something like:

  < WWW-Authenticate: Basic realm="GitHub"

but curl doesn't remember that. However, we should be able to pull it
out of the old request and feed it into the new one. That would save the
second request, which is just a probe.

But ideally we'd have this down to one request. I think we could do
something like this:

diff --git a/http.c b/http.c
index 9bedad7..7e5009d 100644
--- a/http.c
+++ b/http.c
@@ -870,6 +870,11 @@ struct active_request_slot *get_active_slot(void)
 #ifdef LIBCURL_CAN_HANDLE_AUTH_ANY
        curl_easy_setopt(slot->curl, CURLOPT_HTTPAUTH, http_auth_methods);
 #endif
+
+       /* XXX should be configurable via http.* or whatever */
+       curl_easy_setopt(slot->curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
+       credential_fill(&http_auth);
+
        if (http_auth.password || curl_empty_auth)
                init_curl_http_auth(slot->curl);
 

My request then looks like:

  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  Authorization: Basic ...
  < HTTP/1.1 200 OK

with just a single round-trip.

Obviously the hard-coding there is wrong, but the mental model here is
that the user would do something like:

  git config http.https://github.com.auth basic

to say "I know I want to do Basic auth for this host", and that would
trigger git to tell that to curl, and to prompt for the password
preemptively.  The obvious downside is that it requires manual
configuration, and some clue about "Basic" versus other auth types.

I didn't trace the protocol further, but I suspect that curl ends up
doing that probe for _each_ request, because each time we hand it only
the credential without telling it that we know the server wants "Basic"
auth.

So that tries to keep the number of requests down in the first place. As
far as reusing connections for the connections we _do_ make, I'm not
sure. I do see curl claiming to leave the connection up:

  * Connection #0 to host github.com left intact

but then we seem to make a separate one for the next request:

  * Hostname github.com was found in DNS cache
  *   Trying 192.30.252.120...
  * Connected to github.com (192.30.252.120) port 443 (#1)

I'm not sure why that is. We do get SSL reuse:

  * SSL re-using session ID
  * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256

which is good. And then of the two requests curl makes later, the second
one has:

  * Re-using existing connection! (#1) with host github.com

which is good. So I'm not sure why we don't reuse connection #0 from the
very first request. Perhaps it's something in the way we are setting up
the curl handle.

> Finally I also checked out the persistent-https contrib section as a
> workaround but couldn't get it to work. Is that project dead?

I don't know. It certainly hasn't seen a lot of activity lately, but
Google folks might still be using it (I think the original impetus was
for fetching of many repos from the same server for the Android
project). I don't know that we've seen any mention of wide use on the
list.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to