Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-06 Thread Jeff King
On Thu, Apr 06, 2017 at 07:24:54PM +0200, Christian Couder wrote:

> > That would at least tell you if the problem is the chunked encoding, or
> > if it's related to the size.
> 
> The above commands work for me using gitlab.com and the log shows:
> 
> Send header, 000309 bytes (0x0135)
> Send header: POST
> /chriscool/yet-another-test-project.git/git-receive-pack HTTP/1.1
> Send header: Authorization: Basic 
> Send header: User-Agent: git/2.12.2.625.g14da1346c9.dirty
> Send header: Host: gitlab.com
> Send header: Content-Type: application/x-git-receive-pack-request
> Send header: Accept: application/x-git-receive-pack-result
> Send header: Content-Length: 4
> 
> Send header, 000341 bytes (0x0155)
> Send header: POST
> /chriscool/yet-another-test-project.git/git-receive-pack HTTP/1.1
> Send header: Authorization: Basic 
> Send header: User-Agent: git/2.12.2.625.g14da1346c9.dirty
> Send header: Host: gitlab.com
> Send header: Accept-Encoding: gzip
> Send header: Content-Type: application/x-git-receive-pack-request
> Send header: Accept: application/x-git-receive-pack-result
> Send header: Transfer-Encoding: chunked
> 
> Maybe the reverse proxy doesn't like it when the push is really big.

Interesting. So it is OK with the chunked encoding. It seems odd that it
would complain about a bigger chunked encoding, but then work correctly
with a single big buffer. But I guess it would all depend on what kind
of buffering logic the reverse proxy uses.

-Peff


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-06 Thread Christian Couder
On Tue, Apr 4, 2017 at 10:40 PM, Jeff King  wrote:
> On Tue, Apr 04, 2017 at 06:42:23PM +, David Turner wrote:
>
>> > What does it look like when it fails? What does GIT_TRACE_CURL look like 
>> > (or
>> > GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
>> > lines)?
>>
>> Unfortunately, we've already worked around the problem by pushing over SSH,
>> so I no longer have a failing case to examine. Last time I tried, I actually 
>> did some
>> hackery to create a push smaller than 2GB, but it still failed (this time, 
>> with
>> "502 Bad Gateway").  So, something is clearly weird in GitLab land.
>>
>> I did see "Transfer-Encoding: chunked" in one of the responses from the 
>> server,
>>  but not in the request (not sure if that's normal). The smaller push had:
>> Content-Length: 1048908476
>
> The 502 makes me think it's a problem in the GitLab reverse-proxy layer
> (and also my experience debugging Git-over-HTTP weirdness on GitHub's reverse
> proxy layer, which had a number of pitfalls ;) ).

Yeah, maybe.

> You should be able to do a synthetic test like:
>
>   git init
>   dd if=/dev/urandom of=foo.rand bs=1k count=1024
>   git add .
>   git commit -m 'random megabyte'
>   GIT_TRACE_CURL=/tmp/foo.out \
> git -c http.postbuffer=0 push https://...
>
> You should see two POSTs to /git-receive-pack, like this:
>
>   Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
>   Send header: Host: github.com
>   Send header: Authorization: Basic 
>   Send header: User-Agent: git/2.12.2.952.g759391acc
>   Send header: Content-Type: application/x-git-receive-pack-request
>   Send header: Accept: application/x-git-receive-pack-result
>   Send header: Content-Length: 4
>
>   Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
>   Send header: Host: github.com
>   Send header: Authorization: Basic 
>   Send header: User-Agent: git/2.12.2.952.g759391acc
>   Send header: Accept-Encoding: gzip
>   Send header: Content-Type: application/x-git-receive-pack-request
>   Send header: Accept: application/x-git-receive-pack-result
>   Send header: Transfer-Encoding: chunked
>
> The first is a probe to make sure we can hit the endpoint without
> sending the whole payload. And the second should pass up the 1MB
> packfile in chunks.
>
> That would at least tell you if the problem is the chunked encoding, or
> if it's related to the size.

The above commands work for me using gitlab.com and the log shows:

Send header, 000309 bytes (0x0135)
Send header: POST
/chriscool/yet-another-test-project.git/git-receive-pack HTTP/1.1
Send header: Authorization: Basic 
Send header: User-Agent: git/2.12.2.625.g14da1346c9.dirty
Send header: Host: gitlab.com
Send header: Content-Type: application/x-git-receive-pack-request
Send header: Accept: application/x-git-receive-pack-result
Send header: Content-Length: 4

Send header, 000341 bytes (0x0155)
Send header: POST
/chriscool/yet-another-test-project.git/git-receive-pack HTTP/1.1
Send header: Authorization: Basic 
Send header: User-Agent: git/2.12.2.625.g14da1346c9.dirty
Send header: Host: gitlab.com
Send header: Accept-Encoding: gzip
Send header: Content-Type: application/x-git-receive-pack-request
Send header: Accept: application/x-git-receive-pack-result
Send header: Transfer-Encoding: chunked

Maybe the reverse proxy doesn't like it when the push is really big.


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 06:42:23PM +, David Turner wrote:

> > What does it look like when it fails? What does GIT_TRACE_CURL look like (or
> > GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
> > lines)?
> 
> Unfortunately, we've already worked around the problem by pushing over SSH, 
> so I no longer have a failing case to examine. Last time I tried, I actually 
> did some 
> hackery to create a push smaller than 2GB, but it still failed (this time, 
> with 
> "502 Bad Gateway").  So, something is clearly weird in GitLab land.
> 
> I did see "Transfer-Encoding: chunked" in one of the responses from the 
> server,
>  but not in the request (not sure if that's normal). The smaller push had: 
> Content-Length: 1048908476

The 502 makes me think it's a problem in the GitLab reverse-proxy layer
(and also my experience debugging Git-over-HTTP weirdness on GitHub's reverse
proxy layer, which had a number of pitfalls ;) ).

You should be able to do a synthetic test like:

  git init
  dd if=/dev/urandom of=foo.rand bs=1k count=1024
  git add .
  git commit -m 'random megabyte'
  GIT_TRACE_CURL=/tmp/foo.out \
git -c http.postbuffer=0 push https://...

You should see two POSTs to /git-receive-pack, like this:

  Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
  Send header: Host: github.com
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.12.2.952.g759391acc
  Send header: Content-Type: application/x-git-receive-pack-request
  Send header: Accept: application/x-git-receive-pack-result
  Send header: Content-Length: 4

  Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
  Send header: Host: github.com
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.12.2.952.g759391acc
  Send header: Accept-Encoding: gzip
  Send header: Content-Type: application/x-git-receive-pack-request
  Send header: Accept: application/x-git-receive-pack-result
  Send header: Transfer-Encoding: chunked

The first is a probe to make sure we can hit the endpoint without
sending the whole payload. And the second should pass up the 1MB
packfile in chunks.

That would at least tell you if the problem is the chunked encoding, or
if it's related to the size.

> (For me to publish longer log traces requires a level of security review 
> which is 
> probably too much of a hassle unless you think it will be really useful).

Nah, I doubt there's much to see except "did a small chunked transfer
work", and anything relevant you can pick out of the server response
(but probably "502" is the extent of it).

> > IMHO, complaining about the negative number to the user would be an
> > improvement.
> 
> That seems reasonable.

You can do that with:

   if (http_post_buffer < 0)
die("negative http.postBuffer not allowed");

but I was trying to suggest that using git_parse_unsigned() should
detect that error for you. It doesn't seem to, though! The strtoumax()
function happily converts negatives into their twos-complement
wraparounds. We could detect it by looking for a leading "-" ourselves,
though I wonder if anybody is relying on the "-1" behavior.

-Peff


RE: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-04 Thread David Turner
> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 3, 2017 10:02 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values
> 
> On Mon, Apr 03, 2017 at 05:03:49PM +, David Turner wrote:
> 
> > > > Unfortunately, in order to push some large repos, the http
> > > > postbuffer must sometimes exceed two gigabytes.  On a 64-bit system,
> this is OK:
> > > > we just malloc a larger buffer.
> > >
> > > I'm still not sure why a 2GB post-buffer is necessary. It sounds
> > > like something is broken in your setup. Large pushes should be sent
> chunked.
> > >
> > > I know broken setups are a fact of life, but this feels like a
> > > really hacky work- around.
> >
> > I'm not sure what other workaround I should use.  I guess I could do
> > multiple pushes, but only if individual objects are under the size
> > limit, and I'm not sure all of mine are (maybe I'll get lucky tho).  I
> > know that this is a configuration issue with gitlab:
> > https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know
> > when that will get fixed.  I could manually copy the repo to the
> > server and do a local push, but I don't know that I have the necessary
> > permissions to do that. Or I could do this, which would hopefully
> > actually solve the problem.
> 
> I didn't think we had gotten details on what was actually broken. Is it really
> that GitLab does not support chunked transfers? I know that's what the issue
> above says, but it sounds like it is just assuming that is the problem based 
> on
> the recent messages to the list.

I agree that we don't know for sure what's actually broken.  I think probably 
the
GitLab bug tracker might be the better place to figure that out, unless GitLab 
thinks they're hitting a bug in git.

> If that's really the issue, then OK. That's lame, but something the client 
> has to
> work around. It seems like a pretty big gap, though (and one I'd think people
> would have hit before; the default post-buffer is only 1MB. Surely people
> routinely push more than that to GitLab servers?
> So I'm really wondering if there is something else going on here.
> 
> What does it look like when it fails? What does GIT_TRACE_CURL look like (or
> GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
> lines)?

Unfortunately, we've already worked around the problem by pushing over SSH, 
so I no longer have a failing case to examine. Last time I tried, I actually 
did some 
hackery to create a push smaller than 2GB, but it still failed (this time, with 
"502 Bad Gateway").  So, something is clearly weird in GitLab land.

I did see "Transfer-Encoding: chunked" in one of the responses from the server,
 but not in the request (not sure if that's normal). The smaller push had: 
Content-Length: 1048908476

(For me to publish longer log traces requires a level of security review which 
is 
probably too much of a hassle unless you think it will be really useful).

> > > The ultimate fate of this number, though, is to be handed to:
> > >
> > >   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> > >
> > > where the final argument is interpreted as a long. So I suspect that
> > > on 64-bit Windows, setting http.postbuffer to "3G" would cause some
> > > kind of weird error (either a truncated post or some internal curl
> > > error due to the negative size, depending on how curl handles it).
> >
> > Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.
> 
> Ah, neat. I didn't even know about CURLOPT_POSTFIELDSIZE_LARGE, and
> thought we'd have to just limit 32-bit platforms. That's a much better
> solution.
> 
> > > I saw the earlier iteration used a size_t, but you switched it after
> > > the compiler
> > > (rightfully) complained about the signedness. But I'm not sure why
> > > we would want ssize_t here instead of just using git_parse_unsigned().
> >
> > It was originally signed.  I'm not sure why that was, but I figured it
> > would be simpler to save the extra bit just in case.
> 
> I think it was simply because git_config_int() is the generic "number"
> parser, and nobody ever thought about it.
> 
> In fact, passing a negative value to curl would be disastrous, as it would use
> strlen(). I don't think a negative value could ever get that far, though. It 
> looks
> like the config code would silently turn a negative value into
> LARGE_PACKET_MAX.

I would stil

Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 05:03:49PM +, David Turner wrote:

> > > Unfortunately, in order to push some large repos, the http postbuffer
> > > must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> > > we just malloc a larger buffer.
> > 
> > I'm still not sure why a 2GB post-buffer is necessary. It sounds like 
> > something
> > is broken in your setup. Large pushes should be sent chunked.
> > 
> > I know broken setups are a fact of life, but this feels like a really hacky 
> > work-
> > around.
> 
> I'm not sure what other workaround I should use.  I guess I could do
> multiple pushes, but only if individual objects are under the size
> limit, and I'm not sure all of mine are (maybe I'll get lucky tho).  I
> know that this is a configuration issue with gitlab:
> https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know
> when that will get fixed.  I could manually copy the repo to the
> server and do a local push, but I don't know that I have the necessary
> permissions to do that. Or I could do this, which would hopefully
> actually solve the problem.

I didn't think we had gotten details on what was actually broken. Is it
really that GitLab does not support chunked transfers? I know that's
what the issue above says, but it sounds like it is just assuming that
is the problem based on the recent messages to the list.

If that's really the issue, then OK. That's lame, but something the
client has to work around. It seems like a pretty big gap, though (and
one I'd think people would have hit before; the default post-buffer is
only 1MB. Surely people routinely push more than that to GitLab servers?
So I'm really wondering if there is something else going on here.

What does it look like when it fails? What does GIT_TRACE_CURL look like
(or GIT_CURL_VERBOSE if your client is older, but remember to sanitize
any auth lines)?

> > The ultimate fate of this number, though, is to be handed to:
> > 
> >   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> > 
> > where the final argument is interpreted as a long. So I suspect that on 
> > 64-bit
> > Windows, setting http.postbuffer to "3G" would cause some kind of weird
> > error (either a truncated post or some internal curl error due to the 
> > negative
> > size, depending on how curl handles it).
> 
> Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.

Ah, neat. I didn't even know about CURLOPT_POSTFIELDSIZE_LARGE, and
thought we'd have to just limit 32-bit platforms. That's a much better
solution.

> > I saw the earlier iteration used a size_t, but you switched it after the 
> > compiler
> > (rightfully) complained about the signedness. But I'm not sure why we would
> > want ssize_t here instead of just using git_parse_unsigned().
> 
> It was originally signed.  I'm not sure why that was, but I figured it
> would be simpler to save the extra bit just in case.

I think it was simply because git_config_int() is the generic "number"
parser, and nobody ever thought about it.

In fact, passing a negative value to curl would be disastrous, as it
would use strlen(). I don't think a negative value could ever get that
far, though. It looks like the config code would silently turn a
negative value into LARGE_PACKET_MAX.

IMHO, complaining about the negative number to the user would be an
improvement.

-Peff


RE: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread David Turner


> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Saturday, April 1, 2017 2:01 AM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values
> 
> On Fri, Mar 31, 2017 at 01:26:31PM -0400, David Turner wrote:
> 
> > Unfortunately, in order to push some large repos, the http postbuffer
> > must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> > we just malloc a larger buffer.
> 
> I'm still not sure why a 2GB post-buffer is necessary. It sounds like 
> something
> is broken in your setup. Large pushes should be sent chunked.
> 
> I know broken setups are a fact of life, but this feels like a really hacky 
> work-
> around.

I'm not sure what other workaround I should use.  I guess I could do multiple 
pushes, but only if individual objects are under the size limit, and I'm not 
sure all of mine are (maybe I'll get lucky tho).  I know that this is a 
configuration issue with gitlab: 
https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know when that 
will get fixed.  I could manually copy the repo to the server and do a local 
push, but I don't know that I have the necessary permissions to do that. Or I 
could do this, which would hopefully actually solve the problem.

> > diff --git a/cache.h b/cache.h
> > index fbdf7a815a..5e6747dbb4 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
> > extern int git_config_int(const char *, const char *);  extern int64_t
> > git_config_int64(const char *, const char *);  extern unsigned long
> > git_config_ulong(const char *, const char *);
> > +extern ssize_t git_config_ssize_t(const char *, const char *);
> 
> For most of our other "big" values we use git_config_ulong(). E.g.,
> core.bigfilethreshold. I suspect that would be fine for your purposes here,
> though using size_t is more correct (on Windows "unsigned long" is still only
> 32 bits, even on 64-bit systems).
> 
> The ultimate fate of this number, though, is to be handed to:
> 
>   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> 
> where the final argument is interpreted as a long. So I suspect that on 64-bit
> Windows, setting http.postbuffer to "3G" would cause some kind of weird
> error (either a truncated post or some internal curl error due to the negative
> size, depending on how curl handles it).

Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.

> 
> I think a "git_config_long()" would probably do everything correctly.
> 
> > +static int git_parse_ssize_t(const char *value, ssize_t *ret) {
> > +   ssize_t tmp;
> > +   if (!git_parse_signed(value, ,
> maximum_signed_value_of_type(ssize_t)))
> > +   return 0;
> > +   *ret = tmp;
> > +   return 1;
> > +}
> 
> I saw the earlier iteration used a size_t, but you switched it after the 
> compiler
> (rightfully) complained about the signedness. But I'm not sure why we would
> want ssize_t here instead of just using git_parse_unsigned().

It was originally signed.  I'm not sure why that was, but I figured it would be 
simpler to save the extra bit just in case.


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Mar 31, 2017 at 01:26:31PM -0400, David Turner wrote:
>
>> Unfortunately, in order to push some large repos, the http postbuffer
>> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
>> we just malloc a larger buffer.
>
> I'm still not sure why a 2GB post-buffer is necessary. It sounds like
> something is broken in your setup. Large pushes should be sent chunked.
>
> I know broken setups are a fact of life, but this feels like a really
> hacky work-around.

Yeah, I tend to share that "Huh? We do not want to do this, but what
forces us to?" puzzlement.

Tentatively demoted to 'pu' (out of 'jch'--candidates to go to 'next').


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-01 Thread Jeff King
On Fri, Mar 31, 2017 at 01:26:31PM -0400, David Turner wrote:

> Unfortunately, in order to push some large repos, the http postbuffer
> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> we just malloc a larger buffer.

I'm still not sure why a 2GB post-buffer is necessary. It sounds like
something is broken in your setup. Large pushes should be sent chunked.

I know broken setups are a fact of life, but this feels like a really
hacky work-around.

> diff --git a/cache.h b/cache.h
> index fbdf7a815a..5e6747dbb4 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
>  extern int git_config_int(const char *, const char *);
>  extern int64_t git_config_int64(const char *, const char *);
>  extern unsigned long git_config_ulong(const char *, const char *);
> +extern ssize_t git_config_ssize_t(const char *, const char *);

For most of our other "big" values we use git_config_ulong(). E.g.,
core.bigfilethreshold. I suspect that would be fine for your purposes
here, though using size_t is more correct (on Windows "unsigned long" is
still only 32 bits, even on 64-bit systems).

The ultimate fate of this number, though, is to be handed to:

  curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);

where the final argument is interpreted as a long. So I suspect that on
64-bit Windows, setting http.postbuffer to "3G" would cause some kind of
weird error (either a truncated post or some internal curl error due to
the negative size, depending on how curl handles it). And in that sense
it's worse than the status quo, which rejects this at the config level.

I think a "git_config_long()" would probably do everything correctly.

> +static int git_parse_ssize_t(const char *value, ssize_t *ret)
> +{
> + ssize_t tmp;
> + if (!git_parse_signed(value, , 
> maximum_signed_value_of_type(ssize_t)))
> + return 0;
> + *ret = tmp;
> + return 1;
> +}

I saw the earlier iteration used a size_t, but you switched it after the
compiler (rightfully) complained about the signedness. But I'm not sure
why we would want ssize_t here instead of just using git_parse_unsigned().

All of that's moot if we switch to parsing it as a "long" anyway,
though.

-Peff


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-03-31 Thread Junio C Hamano
David Turner  writes:

> +static int git_parse_ssize_t(const char *value, ssize_t *ret)
> +{
> + ssize_t tmp;
> + if (!git_parse_signed(value, , 
> maximum_signed_value_of_type(ssize_t)))
> + return 0;
> + *ret = tmp;
> + return 1;
> +}
> +
>  NORETURN
>  static void die_bad_number(const char *name, const char *value)
>  {
> @@ -892,6 +901,14 @@ unsigned long git_config_ulong(const char *name, const 
> char *value)
>   return ret;
>  }
>  
> +ssize_t git_config_ssize_t(const char *name, const char *value)
> +{
> + ssize_t ret;
> + if (!git_parse_ssize_t(value, ))
> + die_bad_number(name, value);
> + return ret;
> +}
> +

Thanks.