Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jeff King writes: > I really wonder if this topic is worth pursuing further without finding > a real-world case that actually fails with the v2.19 code. I.e., is > there actually a server that doesn't set CONTENT_LENGTH and really can't > handle read-to-eof? It's plausible to me, but it's also equally > plausible that we'd be breaking some other case. OK.
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jeff King wrote: > On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote: >> I do not think we would mind terribly if we do not support >> combinations like gzipped-and-then-chunked from day one. An in-code >> NEEDSWORK comment that refers to the production in RFC 2616 Page 143 >> may not hurt, though. > > It's pretty common for Git to send gzip'd contents, so this might > actually be necessary on day one. However, it looks like we do so by > setting the content-encoding header. Correct, we haven't been using Transfer-Encoding for that. > I really wonder if this topic is worth pursuing further without finding > a real-world case that actually fails with the v2.19 code. I.e., is > there actually a server that doesn't set CONTENT_LENGTH and really can't > handle read-to-eof? It's plausible to me, but it's also equally > plausible that we'd be breaking some other case. I wonder about the motivating IIS case. The CGI spec says that CONTENT_LENGTH is set if and only if the message has a message-body. When discussing message-body, it says Request-Data = [ request-body ] [ extension-data ] [...] A request-body is supplied with the request if the CONTENT_LENGTH is not NULL. The server MUST make at least that many bytes available for the script to read. The server MAY signal an end-of-file condition after CONTENT_LENGTH bytes have been read or it MAY supply extension data. Therefore, the script MUST NOT attempt to read more than CONTENT_LENGTH bytes, even if more data is available. Does that mean that if CONTENT_LENGTH is not set, then we are guaranteed to see EOF, because extension-data cannot be present? If so, then what we have in v2.19 (plus Max's test improvement that is in "next") is already enough. So I agree. 1. Junio, please eject this patch from "pu", since we don't have any need for it. 2. IIS users, please test v2.19 and let us know how it goes. Do we have any scenarios that would use an empty POST (or other non-GET) request? Thanks, Jonathan
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote: > > That said, a quick search of codesearch.debian.net mostly finds > > examples using straight comparison, so maybe the patch is fine as-is. > > I do not think we would mind terribly if we do not support > combinations like gzipped-and-then-chunked from day one. An in-code > NEEDSWORK comment that refers to the production in RFC 2616 Page 143 > may not hurt, though. It's pretty common for Git to send gzip'd contents, so this might actually be necessary on day one. However, it looks like we do so by setting the content-encoding header. I really wonder if this topic is worth pursuing further without finding a real-world case that actually fails with the v2.19 code. I.e., is there actually a server that doesn't set CONTENT_LENGTH and really can't handle read-to-eof? It's plausible to me, but it's also equally plausible that we'd be breaking some other case. -Peff
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Junio C Hamano writes: >>> + /* >>> +* According to RFC 3875, an empty or missing >>> +* CONTENT_LENGTH means "no body", but RFC 3875 >>> +* precedes HTTP/1.1 and chunked encoding. Apache and >>> +* its imitators leave CONTENT_LENGTH unset for >> >> Which imitators? Maybe this should just say "Apache leaves [...]". > > I tend to agree; I do not mind amending the text while queuing. > ... > I do not think we would mind terribly if we do not support > combinations like gzipped-and-then-chunked from day one. An in-code > NEEDSWORK comment that refers to the production in RFC 2616 Page 143 > may not hurt, though. I refrained from reflowing the first paragraph of the comment in this message, but will probably reflow it before committing, if the updated text is acceptable. http-backend.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/http-backend.c b/http-backend.c index 8f515a6def..b997eafb00 100644 --- a/http-backend.c +++ b/http-backend.c @@ -357,10 +357,17 @@ static ssize_t get_content_length(void) /* * According to RFC 3875, an empty or missing * CONTENT_LENGTH means "no body", but RFC 3875 -* precedes HTTP/1.1 and chunked encoding. Apache and -* its imitators leave CONTENT_LENGTH unset for +* precedes HTTP/1.1 and chunked encoding. Apache +* leaves CONTENT_LENGTH unset for * chunked requests, for which we should use EOF to * detect the end of the request. +* +* NEEDSWORK: Transfer-Encoding header is defined to +* be a list of elements where "chunked", if exists, +* must be at the end. The current code only deals +* with the case where "chunked" is the only element. +* See RFC 2616 (14.41 Transfer-Encoding) when +* extending this code. */ str = getenv("HTTP_TRANSFER_ENCODING"); if (str && !strcmp(str, "chunked"))
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jonathan Nieder writes: > Kicking off the reviews: ;-) > > Jonathan Nieder wrote: > >> --- a/http-backend.c >> +++ b/http-backend.c >> @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t >> req_len, unsigned char **o >> >> static ssize_t get_content_length(void) > [...] >> +/* >> + * According to RFC 3875, an empty or missing >> + * CONTENT_LENGTH means "no body", but RFC 3875 >> + * precedes HTTP/1.1 and chunked encoding. Apache and >> + * its imitators leave CONTENT_LENGTH unset for > > Which imitators? Maybe this should just say "Apache leaves [...]". I tend to agree; I do not mind amending the text while queuing. >> + * chunked requests, for which we should use EOF to >> + * detect the end of the request. >> + */ >> +str = getenv("HTTP_TRANSFER_ENCODING"); >> +if (str && !strcmp(str, "chunked")) > > RFC 2616 says Transfer-Encoding is a list of transfer-codings applied, > in the order that they were applied, and that "chunked" is always > applied last. That means a transfer-encoding like > > Transfer-Encoding: identity chunked > > would be permitted, or e.g. > > Transfer-Encoding: gzip chunked > > Does that means we should be using a check like > > str && (!strcmp(str, "chunked") || ends_with(str, " chunked")) > > ? Hmph, that's "Transfer-Encoding" ":" 1#transfer-coding where #rule is #rule A construct "#" is defined, similar to "*", for defining lists of elements. The full form is "#element" indicating at least and at most elements, each separated by one or more commas (",") and OPTIONAL linear white space (LWS). This makes the usual form of lists very easy; a rule such as ( *LWS element *( *LWS "," *LWS element )) can be shown as 1#element So - you need to account for comma - your LWS may not be a SP if you want to handle gzipped stream coming in a chunked form, I think. Unless I am missing the rule in CGI spec that is used to transform the value on the Transfer-Encoding header to HTTP_TRANSFER_ENCODING environment variable, that is. > That said, a quick search of codesearch.debian.net mostly finds > examples using straight comparison, so maybe the patch is fine as-is. I do not think we would mind terribly if we do not support combinations like gzipped-and-then-chunked from day one. An in-code NEEDSWORK comment that refers to the production in RFC 2616 Page 143 may not hurt, though. Thanks.
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Junio C Hamano wrote: > Jonathan Nieder writes: >> RFC 3875 (the CGI specification) explains: >> >>The CONTENT_LENGTH variable contains the size of the message-body >>attached to the request, if any, in decimal number of octets. If no >>data is attached, then NULL (or unset). [...] >> But that specification was written before HTTP/1.1 and chunked >> encoding. > > RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from > June 1999; I presume that 3875 only writes down what has already > been an established practice and that is where the date discrepancy > comes from. Yes, CGI 1.1 is from 1995. More details are at https://www.w3.org/CGI/. [...] >> Fortunately, there's a way out. Use the HTTP_TRANSFER_ENCODING >> environment variable to distinguish the two cases. > > Cute. > > I'm anxious to learn how well this works in practice. Or is this a > trick you know somebody else's system already uses (in which case, > that's wonderful)? Alas, I came up with it today so I don't know yet how well it will work in practice. I can poke around a little tomorrow in Apache to sanity-check the approach. Results from anyone able to test using various HTTP servers (lighttpd, etc) would also be very welcome. Thanks, Jonathan
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jonathan Nieder writes: > RFC 3875 (the CGI specification) explains: > >The CONTENT_LENGTH variable contains the size of the message-body >attached to the request, if any, in decimal number of octets. If no >data is attached, then NULL (or unset). > > CONTENT_LENGTH = "" | 1*digit > > And: > >This specification does not distinguish between zero-length (NULL) >values and missing values. > > But that specification was written before HTTP/1.1 and chunked > encoding. RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from June 1999; I presume that 3875 only writes down what has already been an established practice and that is where the date discrepancy comes from. This is a bit old but some of those who participated in the discussion archived at https://lists.gt.net/apache/users/373042 seemed to know what they were talking about. In short, lack of content-length for CGI scripts driven by Apache seems to mean that the content length is unknown and we are expected to read through to the eof, and we are expected to ignore the CGI spec, which says missing CONTENT_LENGTH and CONTENT_LENGTH set to an empty string both must mean there is no message body. Instead, we need to take the former as a sign to read through to the end. > Fortunately, there's a way out. Use the HTTP_TRANSFER_ENCODING > environment variable to distinguish the two cases. Cute. I'm anxious to learn how well this works in practice. Or is this a trick you know somebody else's system already uses (in which case, that's wonderful)? > + if (!str || !*str) { > + /* > + * According to RFC 3875, an empty or missing > + * CONTENT_LENGTH means "no body", but RFC 3875 > + * precedes HTTP/1.1 and chunked encoding. Apache and > + * its imitators leave CONTENT_LENGTH unset for > + * chunked requests, for which we should use EOF to > + * detect the end of the request. > + */ > + str = getenv("HTTP_TRANSFER_ENCODING"); > + if (str && !strcmp(str, "chunked")) > + return -1; > + > + return 0; > + } > + if (!git_parse_ssize_t(str, &val)) > die("failed to parse CONTENT_LENGTH: %s", str); > return val; > }
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Kicking off the reviews: ;-) Jonathan Nieder wrote: > --- a/http-backend.c > +++ b/http-backend.c > @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t > req_len, unsigned char **o > > static ssize_t get_content_length(void) [...] > + /* > + * According to RFC 3875, an empty or missing > + * CONTENT_LENGTH means "no body", but RFC 3875 > + * precedes HTTP/1.1 and chunked encoding. Apache and > + * its imitators leave CONTENT_LENGTH unset for Which imitators? Maybe this should just say "Apache leaves [...]". > + * chunked requests, for which we should use EOF to > + * detect the end of the request. > + */ > + str = getenv("HTTP_TRANSFER_ENCODING"); > + if (str && !strcmp(str, "chunked")) RFC 2616 says Transfer-Encoding is a list of transfer-codings applied, in the order that they were applied, and that "chunked" is always applied last. That means a transfer-encoding like Transfer-Encoding: identity chunked would be permitted, or e.g. Transfer-Encoding: gzip chunked Does that means we should be using a check like str && (!strcmp(str, "chunked") || ends_with(str, " chunked")) ? That said, a quick search of codesearch.debian.net mostly finds examples using straight comparison, so maybe the patch is fine as-is. Thanks, Jonathan
[PATCH] http-backend: treat empty CONTENT_LENGTH as zero
As discussed in v2.19.0-rc0~45^2~2 (http-backend: respect CONTENT_LENGTH as specified by rfc3875, 2018-06-10), HTTP servers such as IIS do not close a CGI script's standard input at the end of a request, instead expecting CGI scripts to stop reading after CONTENT_LENGTH bytes. That commit taught http-backend to respect this convention except when CONTENT_LENGTH is unset, in which case it preserved the previous behavior of reading until EOF. RFC 3875 (the CGI specification) explains: The CONTENT_LENGTH variable contains the size of the message-body attached to the request, if any, in decimal number of octets. If no data is attached, then NULL (or unset). CONTENT_LENGTH = "" | 1*digit And: This specification does not distinguish between zero-length (NULL) values and missing values. But that specification was written before HTTP/1.1 and chunked encoding. With chunked encoding, the length of a request is not known early and it is useful to start a CGI script to process it anyway, so Apache and many other servers violate the spec: they leave CONTENT_LENGTH unset and rely on EOF to indicate the end of request. This is reproducible using t5510-fetch.sh, which hangs if http-backend is patched to treat a missing CONTENT_LENGTH as zero. So we are in a bind: to support HTTP servers that don't produce EOF, http-backend should respect an unset or empty CONTENT_LENGTH that represents zero, and to support chunked encoding, http-backend should respect an unset CONTENT_LENGTH that represents "read until EOF". Fortunately, there's a way out. Use the HTTP_TRANSFER_ENCODING environment variable to distinguish the two cases. Reported-by: Jeff King Helped-by: Max Kirillov Signed-off-by: Jonathan Nieder --- How about this? http-backend.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/http-backend.c b/http-backend.c index 458642ef72..7902eeb0b3 100644 --- a/http-backend.c +++ b/http-backend.c @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char **o static ssize_t get_content_length(void) { - ssize_t val = -1; + ssize_t val; const char *str = getenv("CONTENT_LENGTH"); - if (str && *str && !git_parse_ssize_t(str, &val)) + if (!str || !*str) { + /* +* According to RFC 3875, an empty or missing +* CONTENT_LENGTH means "no body", but RFC 3875 +* precedes HTTP/1.1 and chunked encoding. Apache and +* its imitators leave CONTENT_LENGTH unset for +* chunked requests, for which we should use EOF to +* detect the end of the request. +*/ + str = getenv("HTTP_TRANSFER_ENCODING"); + if (str && !strcmp(str, "chunked")) + return -1; + + return 0; + } + if (!git_parse_ssize_t(str, &val)) die("failed to parse CONTENT_LENGTH: %s", str); return val; } -- 2.19.0.397.gdd90340f6a
Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
On Mon, Sep 10, 2018 at 07:20:28PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote: > > >> Thanks. I am wondering if we should go all the way and do > >> > >>ssize_t val; > >>const char *str = getenv("CONTENT_LENGTH"); > >> > >>if (!str || !*str) > >>return 0; > >>if (!git_parse_ssize_t(str, &val)) > >>die(...); > >>return val; > >> > >> That would match the RFC, but it seems to make t5510-fetch.sh hang, > [...] > >> Do you know why? > > > > Yes. :) > > > > It's due to this comment in the patch you are replying to: > > > > + if (!str) { > > + /* > > +* RFC3875 says this must mean "no body", but in practice we > > +* receive chunked encodings with no CONTENT_LENGTH. Tell > > the > > +* caller to read until EOF. > > +*/ > > + val = -1; > > Ah! So "in practice" includes "in Apache". An old discussion[1] on > Apache's httpd-users list agrees. > > The question then becomes: what does IIS do for zero-length requests? > Does any other web server fail to support "read until EOF" in general? > > The CGI standard does not cover chunked encoding so we can't lean on > the standard for advice. It's not clear to me yet whether this patch > improves on what's in "master". I'd note that the case in question (no CONTENT_LENGTH at all) is not changed between this patch and master. It's only the case of CONTENT_LENGTH set to an empty string. But I agree that it is not clear to me whether it is actually improving anything in practice. -Peff
Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
Jeff King wrote: > On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote: >> Thanks. I am wondering if we should go all the way and do >> >> ssize_t val; >> const char *str = getenv("CONTENT_LENGTH"); >> >> if (!str || !*str) >> return 0; >> if (!git_parse_ssize_t(str, &val)) >> die(...); >> return val; >> >> That would match the RFC, but it seems to make t5510-fetch.sh hang, [...] >> Do you know why? > > Yes. :) > > It's due to this comment in the patch you are replying to: > > + if (!str) { > + /* > +* RFC3875 says this must mean "no body", but in practice we > +* receive chunked encodings with no CONTENT_LENGTH. Tell the > +* caller to read until EOF. > +*/ > + val = -1; Ah! So "in practice" includes "in Apache". An old discussion[1] on Apache's httpd-users list agrees. The question then becomes: what does IIS do for zero-length requests? Does any other web server fail to support "read until EOF" in general? The CGI standard does not cover chunked encoding so we can't lean on the standard for advice. It's not clear to me yet whether this patch improves on what's in "master". Thanks, Jonathan [1] http://mail-archives.apache.org/mod_mbox/httpd-users/200909.mbox/%3c4aaacc38.3070...@rowe-clan.net%3E
Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
On Mon, Sep 10, 2018 at 11:53:59PM +0300, Max Kirillov wrote: > From: Jeff King > Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero > > There is no known case where empty body it used by a server as > instruction to read until EOF, so there is no need to violate the RFC. > Make get_content_length() return 0 in this case. > > Currently there is no practical difference, as the GET request > where it can be empty is handled without actual reading the body > (in get_info_refs() function), but it is better to stick to the correct > behavior. There could be a difference if there is a server which actually sets CONTENT_LENGTH to the empty string for a chunked body. But we don't know of any such server at this point. > Signed-off-by: Max Kirillov > --- > The incremental. Hopefully I described the reason right. Needs "signed-off-by" Certainly this is: Signed-off-by: Jeff King -Peff
Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote: > Thanks. I am wondering if we should go all the way and do > > ssize_t val; > const char *str = getenv("CONTENT_LENGTH"); > > if (!str || !*str) > return 0; > if (!git_parse_ssize_t(str, &val)) > die(...); > return val; > > That would match the RFC, but it seems to make t5510-fetch.sh hang, > right after > > ok 165 - --negotiation-tip understands abbreviated SHA-1 > > When I run with -v -i -x, it stalls at > > ++ git -C '/usr/local/google/home/jrn/src/git/t/trash > directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2 > Deleted tag 'alpha_1' (was a84e4a9) > Deleted tag 'alpha_2' (was 7dd5cf4) > Deleted tag 'beta_1' (was bcb5c65) > Deleted tag 'beta_2' (was d3b6dcd) > +++ pwd > ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash > directory.t5510-fetch/trace' > ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 > origin alpha_s beta_s > > Do you know why? Yes. :) It's due to this comment in the patch you are replying to: + if (!str) { + /* +* RFC3875 says this must mean "no body", but in practice we +* receive chunked encodings with no CONTENT_LENGTH. Tell the +* caller to read until EOF. +*/ + val = -1; -Peff
Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
Hi, Max Kirillov wrote: > From: Jeff King > Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero micronit: s/Treat/treat/ > There is no known case where empty body it used by a server as > instruction to read until EOF, so there is no need to violate the RFC. > Make get_content_length() return 0 in this case. > > Currently there is no practical difference, as the GET request > where it can be empty is handled without actual reading the body > (in get_info_refs() function), but it is better to stick to the correct > behavior. > > Signed-off-by: Max Kirillov > --- > The incremental. Hopefully I described the reason right. Needs "signed-off-by" Thanks. I am wondering if we should go all the way and do ssize_t val; const char *str = getenv("CONTENT_LENGTH"); if (!str || !*str) return 0; if (!git_parse_ssize_t(str, &val)) die(...); return val; That would match the RFC, but it seems to make t5510-fetch.sh hang, right after ok 165 - --negotiation-tip understands abbreviated SHA-1 When I run with -v -i -x, it stalls at ++ git -C '/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2 Deleted tag 'alpha_1' (was a84e4a9) Deleted tag 'alpha_2' (was 7dd5cf4) Deleted tag 'beta_1' (was bcb5c65) Deleted tag 'beta_2' (was d3b6dcd) +++ pwd ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/trace' ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 origin alpha_s beta_s Do you know why? Thanks, Jonathan
[PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
From: Jeff King Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero There is no known case where empty body it used by a server as instruction to read until EOF, so there is no need to violate the RFC. Make get_content_length() return 0 in this case. Currently there is no practical difference, as the GET request where it can be empty is handled without actual reading the body (in get_info_refs() function), but it is better to stick to the correct behavior. Signed-off-by: Max Kirillov --- The incremental. Hopefully I described the reason right. Needs "signed-off-by" http-backend.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/http-backend.c b/http-backend.c index 458642ef72..ea36a52118 100644 --- a/http-backend.c +++ b/http-backend.c @@ -353,8 +353,28 @@ static ssize_t get_content_length(void) ssize_t val = -1; const char *str = getenv("CONTENT_LENGTH"); - if (str && *str && !git_parse_ssize_t(str, &val)) - die("failed to parse CONTENT_LENGTH: %s", str); + if (!str) { + /* +* RFC3875 says this must mean "no body", but in practice we +* receive chunked encodings with no CONTENT_LENGTH. Tell the +* caller to read until EOF. +*/ + val = -1; + } else if (!*str) { + /* +* An empty length should be treated as "no body" according to +* RFC3875, and this seems to hold in practice. +*/ + val = 0; + } else { + /* +* We have a non-empty CONTENT_LENGTH; trust what's in it as long +* as it can be parsed. +*/ + if (!git_parse_ssize_t(str, &val)) + die("failed to parse CONTENT_LENGTH: '%s'", str); + } + return val; } -- 2.17.0.1185.g782057d875