Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-12 Thread Junio C Hamano
Jeff King  writes:

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

OK.


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-12 Thread Jonathan Nieder
Jeff King wrote:
> On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote:

>> I do not think we would mind terribly if we do not support
>> combinations like gzipped-and-then-chunked from day one.  An in-code
>> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
>> may not hurt, though.
>
> It's pretty common for Git to send gzip'd contents, so this might
> actually be necessary on day one. However, it looks like we do so by
> setting the content-encoding header.

Correct, we haven't been using Transfer-Encoding for that.

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

I wonder about the motivating IIS case.  The CGI spec says that
CONTENT_LENGTH is set if and only if the message has a message-body.
When discussing message-body, it says

  Request-Data   = [ request-body ] [ extension-data ]
[...]
   A request-body is supplied with the request if the CONTENT_LENGTH is
   not NULL.  The server MUST make at least that many bytes available
   for the script to read.  The server MAY signal an end-of-file
   condition after CONTENT_LENGTH bytes have been read or it MAY supply
   extension data.  Therefore, the script MUST NOT attempt to read more
   than CONTENT_LENGTH bytes, even if more data is available.

Does that mean that if CONTENT_LENGTH is not set, then we are
guaranteed to see EOF, because extension-data cannot be present?  If
so, then what we have in v2.19 (plus Max's test improvement that is in
"next") is already enough.

So I agree.

 1. Junio, please eject this patch from "pu", since we don't have any
need for it.

 2. IIS users, please test v2.19 and let us know how it goes.

Do we have any scenarios that would use an empty POST (or other
non-GET) request?

Thanks,
Jonathan


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-11 Thread Jeff King
On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote:

> > That said, a quick search of codesearch.debian.net mostly finds
> > examples using straight comparison, so maybe the patch is fine as-is.
> 
> I do not think we would mind terribly if we do not support
> combinations like gzipped-and-then-chunked from day one.  An in-code
> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
> may not hurt, though.

It's pretty common for Git to send gzip'd contents, so this might
actually be necessary on day one. However, it looks like we do so by
setting the content-encoding header.

I really wonder if this topic is worth pursuing further without finding
a real-world case that actually fails with the v2.19 code. I.e., is
there actually a server that doesn't set CONTENT_LENGTH and really can't
handle read-to-eof? It's plausible to me, but it's also equally
plausible that we'd be breaking some other case.

-Peff


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-11 Thread Junio C Hamano
Junio C Hamano  writes:

>>> +   /*
>>> +* According to RFC 3875, an empty or missing
>>> +* CONTENT_LENGTH means "no body", but RFC 3875
>>> +* precedes HTTP/1.1 and chunked encoding. Apache and
>>> +* its imitators leave CONTENT_LENGTH unset for
>>
>> Which imitators?  Maybe this should just say "Apache leaves [...]".
>
> I tend to agree; I do not mind amending the text while queuing.
> ...
> I do not think we would mind terribly if we do not support
> combinations like gzipped-and-then-chunked from day one.  An in-code
> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
> may not hurt, though.

I refrained from reflowing the first paragraph of the comment in
this message, but will probably reflow it before committing, if the
updated text is acceptable.


 http-backend.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 8f515a6def..b997eafb00 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -357,10 +357,17 @@ static ssize_t get_content_length(void)
/*
 * According to RFC 3875, an empty or missing
 * CONTENT_LENGTH means "no body", but RFC 3875
-* precedes HTTP/1.1 and chunked encoding. Apache and
-* its imitators leave CONTENT_LENGTH unset for
+* precedes HTTP/1.1 and chunked encoding. Apache
+* leaves CONTENT_LENGTH unset for
 * chunked requests, for which we should use EOF to
 * detect the end of the request.
+*
+* NEEDSWORK: Transfer-Encoding header is defined to
+* be a list of elements where "chunked", if exists,
+* must be at the end.  The current code only deals
+* with the case where "chunked" is the only element.
+* See RFC 2616 (14.41 Transfer-Encoding) when
+* extending this code.
 */
str = getenv("HTTP_TRANSFER_ENCODING");
if (str && !strcmp(str, "chunked"))


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-11 Thread Junio C Hamano
Jonathan Nieder  writes:

> Kicking off the reviews: ;-)
>
> Jonathan Nieder wrote:
>
>> --- a/http-backend.c
>> +++ b/http-backend.c
>> @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t 
>> req_len, unsigned char **o
>>  
>>  static ssize_t get_content_length(void)
> [...]
>> +/*
>> + * According to RFC 3875, an empty or missing
>> + * CONTENT_LENGTH means "no body", but RFC 3875
>> + * precedes HTTP/1.1 and chunked encoding. Apache and
>> + * its imitators leave CONTENT_LENGTH unset for
>
> Which imitators?  Maybe this should just say "Apache leaves [...]".

I tend to agree; I do not mind amending the text while queuing.

>> + * chunked requests, for which we should use EOF to
>> + * detect the end of the request.
>> + */
>> +str = getenv("HTTP_TRANSFER_ENCODING");
>> +if (str && !strcmp(str, "chunked"))
>
> RFC 2616 says Transfer-Encoding is a list of transfer-codings applied,
> in the order that they were applied, and that "chunked" is always
> applied last.  That means a transfer-encoding like
>
>   Transfer-Encoding: identity chunked
>
> would be permitted, or e.g.
>
>   Transfer-Encoding: gzip chunked
>
> Does that means we should be using a check like
>
>   str && (!strcmp(str, "chunked") || ends_with(str, " chunked"))
>
> ?

Hmph, that's 

"Transfer-Encoding" ":" 1#transfer-coding

where #rule is

   #rule
  A construct "#" is defined, similar to "*", for defining lists of
  elements. The full form is "#element" indicating at least
   and at most  elements, each separated by one or more commas
  (",") and OPTIONAL linear white space (LWS). This makes the usual
  form of lists very easy; a rule such as
 ( *LWS element *( *LWS "," *LWS element ))
  can be shown as
 1#element

So

 - you need to account for comma
 - your LWS may not be a SP

if you want to handle gzipped stream coming in a chunked form, I
think.

Unless I am missing the rule in CGI spec that is used to transform
the value on the Transfer-Encoding header to HTTP_TRANSFER_ENCODING
environment variable, that is.

> That said, a quick search of codesearch.debian.net mostly finds
> examples using straight comparison, so maybe the patch is fine as-is.

I do not think we would mind terribly if we do not support
combinations like gzipped-and-then-chunked from day one.  An in-code
NEEDSWORK comment that refers to the production in RFC 2616 Page 143
may not hurt, though.

Thanks.


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> RFC 3875 (the CGI specification) explains:
>>
>>The CONTENT_LENGTH variable contains the size of the message-body
>>attached to the request, if any, in decimal number of octets.  If no
>>data is attached, then NULL (or unset).
[...]
>> But that specification was written before HTTP/1.1 and chunked
>> encoding.
>
> RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from
> June 1999; I presume that 3875 only writes down what has already
> been an established practice and that is where the date discrepancy
> comes from.

Yes, CGI 1.1 is from 1995.  More details are at
https://www.w3.org/CGI/.

[...]
>> Fortunately, there's a way out.  Use the HTTP_TRANSFER_ENCODING
>> environment variable to distinguish the two cases.
>
> Cute.
>
> I'm anxious to learn how well this works in practice. Or is this a
> trick you know somebody else's system already uses (in which case,
> that's wonderful)?

Alas, I came up with it today so I don't know yet how well it will
work in practice.

I can poke around a little tomorrow in Apache to sanity-check the
approach.  Results from anyone able to test using various HTTP servers
(lighttpd, etc) would also be very welcome.

Thanks,
Jonathan


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Junio C Hamano
Jonathan Nieder  writes:

> RFC 3875 (the CGI specification) explains:
>
>The CONTENT_LENGTH variable contains the size of the message-body
>attached to the request, if any, in decimal number of octets.  If no
>data is attached, then NULL (or unset).
>
>   CONTENT_LENGTH = "" | 1*digit
>
> And:
>
>This specification does not distinguish between zero-length (NULL)
>values and missing values.
>
> But that specification was written before HTTP/1.1 and chunked
> encoding.

RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from
June 1999; I presume that 3875 only writes down what has already
been an established practice and that is where the date discrepancy
comes from.

This is a bit old but some of those who participated in the
discussion archived at https://lists.gt.net/apache/users/373042
seemed to know what they were talking about.  In short, lack of
content-length for CGI scripts driven by Apache seems to mean that
the content length is unknown and we are expected to read through to
the eof, and we are expected to ignore the CGI spec, which says
missing CONTENT_LENGTH and CONTENT_LENGTH set to an empty string
both must mean there is no message body.  Instead, we need to take
the former as a sign to read through to the end.

> Fortunately, there's a way out.  Use the HTTP_TRANSFER_ENCODING
> environment variable to distinguish the two cases.

Cute.

I'm anxious to learn how well this works in practice. Or is this a
trick you know somebody else's system already uses (in which case,
that's wonderful)?

> + if (!str || !*str) {
> + /*
> +  * According to RFC 3875, an empty or missing
> +  * CONTENT_LENGTH means "no body", but RFC 3875
> +  * precedes HTTP/1.1 and chunked encoding. Apache and
> +  * its imitators leave CONTENT_LENGTH unset for
> +  * chunked requests, for which we should use EOF to
> +  * detect the end of the request.
> +  */
> + str = getenv("HTTP_TRANSFER_ENCODING");
> + if (str && !strcmp(str, "chunked"))
> + return -1;
> +
> + return 0;
> + }
> + if (!git_parse_ssize_t(str, ))
>   die("failed to parse CONTENT_LENGTH: %s", str);
>   return val;
>  }


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jonathan Nieder
Kicking off the reviews: ;-)

Jonathan Nieder wrote:

> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t 
> req_len, unsigned char **o
>  
>  static ssize_t get_content_length(void)
[...]
> + /*
> +  * According to RFC 3875, an empty or missing
> +  * CONTENT_LENGTH means "no body", but RFC 3875
> +  * precedes HTTP/1.1 and chunked encoding. Apache and
> +  * its imitators leave CONTENT_LENGTH unset for

Which imitators?  Maybe this should just say "Apache leaves [...]".

> +  * chunked requests, for which we should use EOF to
> +  * detect the end of the request.
> +  */
> + str = getenv("HTTP_TRANSFER_ENCODING");
> + if (str && !strcmp(str, "chunked"))

RFC 2616 says Transfer-Encoding is a list of transfer-codings applied,
in the order that they were applied, and that "chunked" is always
applied last.  That means a transfer-encoding like

Transfer-Encoding: identity chunked

would be permitted, or e.g.

Transfer-Encoding: gzip chunked

Does that means we should be using a check like

str && (!strcmp(str, "chunked") || ends_with(str, " chunked"))

?

That said, a quick search of codesearch.debian.net mostly finds
examples using straight comparison, so maybe the patch is fine as-is.

Thanks,
Jonathan


Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jeff King
On Mon, Sep 10, 2018 at 07:20:28PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:
> 
> >> Thanks.  I am wondering if we should go all the way and do
> >>
> >>ssize_t val;
> >>const char *str = getenv("CONTENT_LENGTH");
> >>
> >>if (!str || !*str)
> >>return 0;
> >>if (!git_parse_ssize_t(str, ))
> >>die(...);
> >>return val;
> >>
> >> That would match the RFC, but it seems to make t5510-fetch.sh hang,
> [...]
> >> Do you know why?
> >
> > Yes. :)
> >
> > It's due to this comment in the patch you are replying to:
> >
> > +   if (!str) {
> > +   /*
> > +* RFC3875 says this must mean "no body", but in practice we
> > +* receive chunked encodings with no CONTENT_LENGTH. Tell 
> > the
> > +* caller to read until EOF.
> > +*/
> > +   val = -1;
> 
> Ah!  So "in practice" includes "in Apache".  An old discussion[1] on
> Apache's httpd-users list agrees.
> 
> The question then becomes: what does IIS do for zero-length requests?
> Does any other web server fail to support "read until EOF" in general?
> 
> The CGI standard does not cover chunked encoding so we can't lean on
> the standard for advice.  It's not clear to me yet whether this patch
> improves on what's in "master".

I'd note that the case in question (no CONTENT_LENGTH at all) is not
changed between this patch and master. It's only the case of
CONTENT_LENGTH set to an empty string. But I agree that it is not clear
to me whether it is actually improving anything in practice.

-Peff


Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:

>> Thanks.  I am wondering if we should go all the way and do
>>
>>  ssize_t val;
>>  const char *str = getenv("CONTENT_LENGTH");
>>
>>  if (!str || !*str)
>>  return 0;
>>  if (!git_parse_ssize_t(str, ))
>>  die(...);
>>  return val;
>>
>> That would match the RFC, but it seems to make t5510-fetch.sh hang,
[...]
>> Do you know why?
>
> Yes. :)
>
> It's due to this comment in the patch you are replying to:
>
> +   if (!str) {
> +   /*
> +* RFC3875 says this must mean "no body", but in practice we
> +* receive chunked encodings with no CONTENT_LENGTH. Tell the
> +* caller to read until EOF.
> +*/
> +   val = -1;

Ah!  So "in practice" includes "in Apache".  An old discussion[1] on
Apache's httpd-users list agrees.

The question then becomes: what does IIS do for zero-length requests?
Does any other web server fail to support "read until EOF" in general?

The CGI standard does not cover chunked encoding so we can't lean on
the standard for advice.  It's not clear to me yet whether this patch
improves on what's in "master".

Thanks,
Jonathan

[1] 
http://mail-archives.apache.org/mod_mbox/httpd-users/200909.mbox/%3c4aaacc38.3070...@rowe-clan.net%3E


Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jeff King
On Mon, Sep 10, 2018 at 11:53:59PM +0300, Max Kirillov wrote:

> From: Jeff King 
> Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
> 
> There is no known case where empty body it used by a server as
> instruction to read until EOF, so there is no need to violate the RFC.
> Make get_content_length() return 0 in this case.
> 
> Currently there is no practical difference, as the GET request
> where it can be empty is handled without actual reading the body
> (in get_info_refs() function), but it is better to stick to the correct
> behavior.

There could be a difference if there is a server which actually sets
CONTENT_LENGTH to the empty string for a chunked body. But we don't know
of any such server at this point.

> Signed-off-by: Max Kirillov 
> ---
> The incremental. Hopefully I described the reason right. Needs "signed-off-by"

Certainly this is:

  Signed-off-by: Jeff King 

-Peff


Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jeff King
On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:

> Thanks.  I am wondering if we should go all the way and do
> 
>   ssize_t val;
>   const char *str = getenv("CONTENT_LENGTH");
> 
>   if (!str || !*str)
>   return 0;
>   if (!git_parse_ssize_t(str, ))
>   die(...);
>   return val;
> 
> That would match the RFC, but it seems to make t5510-fetch.sh hang,
> right after
> 
>   ok 165 - --negotiation-tip understands abbreviated SHA-1
> 
> When I run with -v -i -x, it stalls at
> 
>   ++ git -C '/usr/local/google/home/jrn/src/git/t/trash 
> directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2
>   Deleted tag 'alpha_1' (was a84e4a9)
>   Deleted tag 'alpha_2' (was 7dd5cf4)
>   Deleted tag 'beta_1' (was bcb5c65)
>   Deleted tag 'beta_2' (was d3b6dcd)
>   +++ pwd
>   ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash 
> directory.t5510-fetch/trace'
>   ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 
> origin alpha_s beta_s
> 
> Do you know why?

Yes. :)

It's due to this comment in the patch you are replying to:

+   if (!str) {
+   /*
+* RFC3875 says this must mean "no body", but in practice we
+* receive chunked encodings with no CONTENT_LENGTH. Tell the
+* caller to read until EOF.
+*/
+   val = -1;

-Peff


Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

2018-09-10 Thread Jonathan Nieder
Hi,

Max Kirillov wrote:

> From: Jeff King 
> Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

micronit: s/Treat/treat/

> There is no known case where empty body it used by a server as
> instruction to read until EOF, so there is no need to violate the RFC.
> Make get_content_length() return 0 in this case.
>
> Currently there is no practical difference, as the GET request
> where it can be empty is handled without actual reading the body
> (in get_info_refs() function), but it is better to stick to the correct
> behavior.
>
> Signed-off-by: Max Kirillov 
> ---
> The incremental. Hopefully I described the reason right. Needs "signed-off-by"

Thanks.  I am wondering if we should go all the way and do

ssize_t val;
const char *str = getenv("CONTENT_LENGTH");

if (!str || !*str)
return 0;
if (!git_parse_ssize_t(str, ))
die(...);
return val;

That would match the RFC, but it seems to make t5510-fetch.sh hang,
right after

  ok 165 - --negotiation-tip understands abbreviated SHA-1

When I run with -v -i -x, it stalls at

  ++ git -C '/usr/local/google/home/jrn/src/git/t/trash 
directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2
  Deleted tag 'alpha_1' (was a84e4a9)
  Deleted tag 'alpha_2' (was 7dd5cf4)
  Deleted tag 'beta_1' (was bcb5c65)
  Deleted tag 'beta_2' (was d3b6dcd)
  +++ pwd
  ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash 
directory.t5510-fetch/trace'
  ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 
origin alpha_s beta_s

Do you know why?

Thanks,
Jonathan