Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-08-12 10:27 GMT+02:00 Luca Toscano: > > > 2016-08-11 18:14 GMT+02:00 William A Rowe Jr : > >> Hi Luca, >> >> it helps at this point in the discussions to post the delta to >> branches/2.4.x, >> for all of us to see the delta between 'current' and 'prospective' >> behaviors, >> and ignoring the status of trunk for the moment. >> >> Attached is that delta... >> >> > Thanks a lot William, you are completely right. I created another proposal > that might be a bit better (apologies in advance if my english is not good > enough): > > 1) diff with current trunk: http://apaste.info/ck8 > 2) complete diff from 2.4.x: http://apaste.info/dtI > > In this way the following use cases will be covered by only one log: > > 1) (F)CGI backend sends a Last-Modified header not in GMT and considered > in the future by httpd (like now() in the EU/Paris timezone) - first log > 2) (F)CGI backend sends a Last-Modified header not in GMT and not > considered in the future by httpd (like now() + 2 hours in the US West > timezone) - second log > 3) (F)CGI backend sends a Last-Modified header in GMT but with a datetime > in the future - third log > > Manually tested in trunk and it seems working fine, but if we reach > consensus this would also need a test in the httpd suite. > > Simplified a bit the wording and added more comments: http://apaste.info/m8v (diff with current trunk). The use cases are the same as the ones described in my previous email. William (and everybody that wants to give me feedback), would you mind to review the logs and let me know your thoughts? Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
Hi Luca, it helps at this point in the discussions to post the delta to branches/2.4.x, for all of us to see the delta between 'current' and 'prospective' behaviors, and ignoring the status of trunk for the moment. Attached is that delta... On Thu, Aug 11, 2016 at 5:49 AM, Luca Toscanowrote: > > > 2016-08-04 11:52 GMT+02:00 Luca Toscano : > >> >> >> 2016-08-02 10:17 GMT+02:00 Luca Toscano : >> >>> >>> 2016-08-01 21:13 GMT+02:00 Jacob Champion As stated above, this is not my first choice -- but I wouldn't oppose it if that's what the consensus comes to. else if (!ap_cstr_casecmp(w, "Last-Modified")) { > -apr_time_t parsed_date = apr_date_parse_rfc(l); > +apr_time_t parsed_date = apr_date_parse_http(l); > apr_date_parse_http() is not good enough; IIUC, it completely ignores timezones, which further corrupts non-GMT Last-Modified stamps. We either want strict parsing or actual correction, not something in the middle. >>> >>> >>> You are right, but this is the current behavior, this is why I used >>> apr_date_parse_http. My idea was to keep what we have at the moment, adding >>> a clear log that states the inconsistency found by httpd and why it has >>> been corrected. In this case it would simply mean that a value considered >>> "in the future" from GMT would be logged and corrected accordingly. This >>> would still leave out other wrong use cases like a datetime string in the >>> US/West timezone "in the future" if converted in GMT, but to solve it we'd >>> need to interpret the Last-Modified header value with the >>> apr_date_parse_rfc function and afaiu we still need to reach an agreement :) >>> >>> So I am really in favor of using apr_date_parse_rfc but I wanted to come >>> up with a possible to solution to satisfy everybody and help our users with >>> some clue. >>> >>> Example about the wrong use case mentioned by me for everybody (it >>> helped me a lot so it might also be useful to other people reading): >>> >>> === >>> HTTP/1.1 200 OK >>> Date: Tue, 02 Aug 2016 07:54:39 GMT >>> Server: Apache/2.5.0-dev (Unix) >>> Last-Modified: Tue, 02 Aug 2016 00:54:39 GMT >>> Content-Type: text/html; charset=UTF-8 >>> >>> Last-Modified sent: Tue, 02 Aug 2016 00:54:39 -0700 >>> Now in GMT: Tue, 02 Aug 2016 07:54:39 + >>> === >>> >>> As you can see, the FCGI script sets a Last-Modified header value now() >>> in the "America/Los_Angeles" timezone, but it gets interpreted by httpd as >>> a GMT datestring "in the past". So for example "now() + 2 hours" in the >>> "America/Los_Angeles" timezone would still get interpreted as "in the past" >>> by httpd, avoiding any datetime correction to GMT now(). >>> >>> Same example with the code change that we have originally proposed: >>> >>> === >>> HTTP/1.1 200 OK >>> Date: Tue, 02 Aug 2016 08:08:14 GMT >>> Server: Apache/2.5.0-dev (Unix) >>> Last-Modified: Tue, 02 Aug 2016 08:08:14 GMT >>> Content-Type: text/html; charset=UTF-8 >>> >>> Last-Modified sent: Tue, 02 Aug 2016 01:08:14 -0700 >>> Now in GMT: Tue, 02 Aug 2016 08:08:14 + >>> === >>> >>> The debate is all around, as far as I have understood, to if httpd >>> should be able to do the above "interpretation" or not. >>> >>> >>> 2016-08-01 23:13 GMT+02:00 Jacob Champion : >>> On 08/01/2016 01:35 PM, William A Rowe Jr wrote: > So this alternative is not my first choice. Invalid headers should > really either be corrected (if the correction is obvious, safe, and > helpful), or dropped entirely. Or the entire response should be > 500'd, but we run into major compatibility breaks if we choose that > route. > > No, I agree that a 500 is not an option. Drop it, or fix it loudly in > the logs, > but we can't break existing deployments (which don't accept non-GMT > dates today, FWIW). > Based on conversations with Luca, existing deployments do "accept" non-GMT dates, silently corrupting the value, due to the use of the *_parse_http() function. I'm happy to react to bad input following a parsable date string, e.g. > any > non-"GMT" signifier, as entirely bad input worth emitting to the error > log. > Let's help CGI authors find their bugs instead of generating unexpected > results, such as 5-8 hour wrong "Last Modified" dates in the US and > now() throughout Europe, owing to the time zone deltas. > Cool. I'm not opposed to loudly dropping non-GMT timestamps (though I still prefer fixing them up). >>> >>> >>> Wouldn't this introduce a breaking change? I know that everybody is >>> supposed to look into the changelog before upgrading httpd but as we often >>> see this is not the case and we could trigger
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-08-04 11:52 GMT+02:00 Luca Toscano: > > > 2016-08-02 10:17 GMT+02:00 Luca Toscano : > >> >> 2016-08-01 21:13 GMT+02:00 Jacob Champion >>> >>> >>> As stated above, this is not my first choice -- but I wouldn't oppose it >>> if that's what the consensus comes to. >>> >>> else if (!ap_cstr_casecmp(w, "Last-Modified")) { -apr_time_t parsed_date = apr_date_parse_rfc(l); +apr_time_t parsed_date = apr_date_parse_http(l); >>> >>> apr_date_parse_http() is not good enough; IIUC, it completely ignores >>> timezones, which further corrupts non-GMT Last-Modified stamps. We either >>> want strict parsing or actual correction, not something in the middle. >> >> >> You are right, but this is the current behavior, this is why I used >> apr_date_parse_http. My idea was to keep what we have at the moment, adding >> a clear log that states the inconsistency found by httpd and why it has >> been corrected. In this case it would simply mean that a value considered >> "in the future" from GMT would be logged and corrected accordingly. This >> would still leave out other wrong use cases like a datetime string in the >> US/West timezone "in the future" if converted in GMT, but to solve it we'd >> need to interpret the Last-Modified header value with the >> apr_date_parse_rfc function and afaiu we still need to reach an agreement :) >> >> So I am really in favor of using apr_date_parse_rfc but I wanted to come >> up with a possible to solution to satisfy everybody and help our users with >> some clue. >> >> Example about the wrong use case mentioned by me for everybody (it helped >> me a lot so it might also be useful to other people reading): >> >> === >> HTTP/1.1 200 OK >> Date: Tue, 02 Aug 2016 07:54:39 GMT >> Server: Apache/2.5.0-dev (Unix) >> Last-Modified: Tue, 02 Aug 2016 00:54:39 GMT >> Content-Type: text/html; charset=UTF-8 >> >> Last-Modified sent: Tue, 02 Aug 2016 00:54:39 -0700 >> Now in GMT: Tue, 02 Aug 2016 07:54:39 + >> === >> >> As you can see, the FCGI script sets a Last-Modified header value now() >> in the "America/Los_Angeles" timezone, but it gets interpreted by httpd as >> a GMT datestring "in the past". So for example "now() + 2 hours" in the >> "America/Los_Angeles" timezone would still get interpreted as "in the past" >> by httpd, avoiding any datetime correction to GMT now(). >> >> Same example with the code change that we have originally proposed: >> >> === >> HTTP/1.1 200 OK >> Date: Tue, 02 Aug 2016 08:08:14 GMT >> Server: Apache/2.5.0-dev (Unix) >> Last-Modified: Tue, 02 Aug 2016 08:08:14 GMT >> Content-Type: text/html; charset=UTF-8 >> >> Last-Modified sent: Tue, 02 Aug 2016 01:08:14 -0700 >> Now in GMT: Tue, 02 Aug 2016 08:08:14 + >> === >> >> The debate is all around, as far as I have understood, to if httpd should >> be able to do the above "interpretation" or not. >> >> >> 2016-08-01 23:13 GMT+02:00 Jacob Champion : >> >>> On 08/01/2016 01:35 PM, William A Rowe Jr wrote: >>> So this alternative is not my first choice. Invalid headers should really either be corrected (if the correction is obvious, safe, and helpful), or dropped entirely. Or the entire response should be 500'd, but we run into major compatibility breaks if we choose that route. No, I agree that a 500 is not an option. Drop it, or fix it loudly in the logs, but we can't break existing deployments (which don't accept non-GMT dates today, FWIW). >>> >>> Based on conversations with Luca, existing deployments do "accept" >>> non-GMT dates, silently corrupting the value, due to the use of the >>> *_parse_http() function. >>> >>> I'm happy to react to bad input following a parsable date string, e.g. any non-"GMT" signifier, as entirely bad input worth emitting to the error log. Let's help CGI authors find their bugs instead of generating unexpected results, such as 5-8 hour wrong "Last Modified" dates in the US and now() throughout Europe, owing to the time zone deltas. >>> >>> Cool. I'm not opposed to loudly dropping non-GMT timestamps (though I >>> still prefer fixing them up). >> >> >> Wouldn't this introduce a breaking change? I know that everybody is >> supposed to look into the changelog before upgrading httpd but as we often >> see this is not the case and we could trigger nasty issues in various >> installations imho. Compared to this solution I'd much prefer to go for the >> proposed one, invasive but less disruptive :) >> >> As Jacob said, more opinions are very welcome! >> >> One general comment: I really like discussions but in this case I feel >> that we would have solved the issue in 20 minutes talking in person :) >> So maybe a conversation on IRC whenever everybody has a bit of time could >> help
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-08-02 10:17 GMT+02:00 Luca Toscano: > > 2016-08-01 21:13 GMT+02:00 Jacob Champion >> >> >> As stated above, this is not my first choice -- but I wouldn't oppose it >> if that's what the consensus comes to. >> >> else if (!ap_cstr_casecmp(w, "Last-Modified")) { >>> -apr_time_t parsed_date = apr_date_parse_rfc(l); >>> +apr_time_t parsed_date = apr_date_parse_http(l); >>> >> >> apr_date_parse_http() is not good enough; IIUC, it completely ignores >> timezones, which further corrupts non-GMT Last-Modified stamps. We either >> want strict parsing or actual correction, not something in the middle. > > > You are right, but this is the current behavior, this is why I used > apr_date_parse_http. My idea was to keep what we have at the moment, adding > a clear log that states the inconsistency found by httpd and why it has > been corrected. In this case it would simply mean that a value considered > "in the future" from GMT would be logged and corrected accordingly. This > would still leave out other wrong use cases like a datetime string in the > US/West timezone "in the future" if converted in GMT, but to solve it we'd > need to interpret the Last-Modified header value with the > apr_date_parse_rfc function and afaiu we still need to reach an agreement :) > > So I am really in favor of using apr_date_parse_rfc but I wanted to come > up with a possible to solution to satisfy everybody and help our users with > some clue. > > Example about the wrong use case mentioned by me for everybody (it helped > me a lot so it might also be useful to other people reading): > > === > HTTP/1.1 200 OK > Date: Tue, 02 Aug 2016 07:54:39 GMT > Server: Apache/2.5.0-dev (Unix) > Last-Modified: Tue, 02 Aug 2016 00:54:39 GMT > Content-Type: text/html; charset=UTF-8 > > Last-Modified sent: Tue, 02 Aug 2016 00:54:39 -0700 > Now in GMT: Tue, 02 Aug 2016 07:54:39 + > === > > As you can see, the FCGI script sets a Last-Modified header value now() in > the "America/Los_Angeles" timezone, but it gets interpreted by httpd as a > GMT datestring "in the past". So for example "now() + 2 hours" in the > "America/Los_Angeles" timezone would still get interpreted as "in the past" > by httpd, avoiding any datetime correction to GMT now(). > > Same example with the code change that we have originally proposed: > > === > HTTP/1.1 200 OK > Date: Tue, 02 Aug 2016 08:08:14 GMT > Server: Apache/2.5.0-dev (Unix) > Last-Modified: Tue, 02 Aug 2016 08:08:14 GMT > Content-Type: text/html; charset=UTF-8 > > Last-Modified sent: Tue, 02 Aug 2016 01:08:14 -0700 > Now in GMT: Tue, 02 Aug 2016 08:08:14 + > === > > The debate is all around, as far as I have understood, to if httpd should > be able to do the above "interpretation" or not. > > > 2016-08-01 23:13 GMT+02:00 Jacob Champion : > >> On 08/01/2016 01:35 PM, William A Rowe Jr wrote: >> >>> So this alternative is not my first choice. Invalid headers should >>> really either be corrected (if the correction is obvious, safe, and >>> helpful), or dropped entirely. Or the entire response should be >>> 500'd, but we run into major compatibility breaks if we choose that >>> route. >>> >>> No, I agree that a 500 is not an option. Drop it, or fix it loudly in >>> the logs, >>> but we can't break existing deployments (which don't accept non-GMT >>> dates today, FWIW). >>> >> >> Based on conversations with Luca, existing deployments do "accept" >> non-GMT dates, silently corrupting the value, due to the use of the >> *_parse_http() function. >> >> I'm happy to react to bad input following a parsable date string, e.g. any >>> non-"GMT" signifier, as entirely bad input worth emitting to the error >>> log. >>> Let's help CGI authors find their bugs instead of generating unexpected >>> results, such as 5-8 hour wrong "Last Modified" dates in the US and >>> now() throughout Europe, owing to the time zone deltas. >>> >> >> Cool. I'm not opposed to loudly dropping non-GMT timestamps (though I >> still prefer fixing them up). > > > Wouldn't this introduce a breaking change? I know that everybody is > supposed to look into the changelog before upgrading httpd but as we often > see this is not the case and we could trigger nasty issues in various > installations imho. Compared to this solution I'd much prefer to go for the > proposed one, invasive but less disruptive :) > > As Jacob said, more opinions are very welcome! > > One general comment: I really like discussions but in this case I feel > that we would have solved the issue in 20 minutes talking in person :) > So maybe a conversation on IRC whenever everybody has a bit of time could > help reaching a final consensus? > Trying with a new patch: http://apaste.info/xHz I added two separate logs, one for non-GMT header values and the other one for datetime strings considered in
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-08-01 21:13 GMT+02:00 Jacob Champion> > > As stated above, this is not my first choice -- but I wouldn't oppose it > if that's what the consensus comes to. > > else if (!ap_cstr_casecmp(w, "Last-Modified")) { >> -apr_time_t parsed_date = apr_date_parse_rfc(l); >> +apr_time_t parsed_date = apr_date_parse_http(l); >> > > apr_date_parse_http() is not good enough; IIUC, it completely ignores > timezones, which further corrupts non-GMT Last-Modified stamps. We either > want strict parsing or actual correction, not something in the middle. You are right, but this is the current behavior, this is why I used apr_date_parse_http. My idea was to keep what we have at the moment, adding a clear log that states the inconsistency found by httpd and why it has been corrected. In this case it would simply mean that a value considered "in the future" from GMT would be logged and corrected accordingly. This would still leave out other wrong use cases like a datetime string in the US/West timezone "in the future" if converted in GMT, but to solve it we'd need to interpret the Last-Modified header value with the apr_date_parse_rfc function and afaiu we still need to reach an agreement :) So I am really in favor of using apr_date_parse_rfc but I wanted to come up with a possible to solution to satisfy everybody and help our users with some clue. Example about the wrong use case mentioned by me for everybody (it helped me a lot so it might also be useful to other people reading): === HTTP/1.1 200 OK Date: Tue, 02 Aug 2016 07:54:39 GMT Server: Apache/2.5.0-dev (Unix) Last-Modified: Tue, 02 Aug 2016 00:54:39 GMT Content-Type: text/html; charset=UTF-8 Last-Modified sent: Tue, 02 Aug 2016 00:54:39 -0700 Now in GMT: Tue, 02 Aug 2016 07:54:39 + === As you can see, the FCGI script sets a Last-Modified header value now() in the "America/Los_Angeles" timezone, but it gets interpreted by httpd as a GMT datestring "in the past". So for example "now() + 2 hours" in the "America/Los_Angeles" timezone would still get interpreted as "in the past" by httpd, avoiding any datetime correction to GMT now(). Same example with the code change that we have originally proposed: === HTTP/1.1 200 OK Date: Tue, 02 Aug 2016 08:08:14 GMT Server: Apache/2.5.0-dev (Unix) Last-Modified: Tue, 02 Aug 2016 08:08:14 GMT Content-Type: text/html; charset=UTF-8 Last-Modified sent: Tue, 02 Aug 2016 01:08:14 -0700 Now in GMT: Tue, 02 Aug 2016 08:08:14 + === The debate is all around, as far as I have understood, to if httpd should be able to do the above "interpretation" or not. 2016-08-01 23:13 GMT+02:00 Jacob Champion : > On 08/01/2016 01:35 PM, William A Rowe Jr wrote: > >> So this alternative is not my first choice. Invalid headers should >> really either be corrected (if the correction is obvious, safe, and >> helpful), or dropped entirely. Or the entire response should be >> 500'd, but we run into major compatibility breaks if we choose that >> route. >> >> No, I agree that a 500 is not an option. Drop it, or fix it loudly in >> the logs, >> but we can't break existing deployments (which don't accept non-GMT >> dates today, FWIW). >> > > Based on conversations with Luca, existing deployments do "accept" non-GMT > dates, silently corrupting the value, due to the use of the *_parse_http() > function. > > I'm happy to react to bad input following a parsable date string, e.g. any >> non-"GMT" signifier, as entirely bad input worth emitting to the error >> log. >> Let's help CGI authors find their bugs instead of generating unexpected >> results, such as 5-8 hour wrong "Last Modified" dates in the US and >> now() throughout Europe, owing to the time zone deltas. >> > > Cool. I'm not opposed to loudly dropping non-GMT timestamps (though I > still prefer fixing them up). Wouldn't this introduce a breaking change? I know that everybody is supposed to look into the changelog before upgrading httpd but as we often see this is not the case and we could trigger nasty issues in various installations imho. Compared to this solution I'd much prefer to go for the proposed one, invasive but less disruptive :) As Jacob said, more opinions are very welcome! One general comment: I really like discussions but in this case I feel that we would have solved the issue in 20 minutes talking in person :) So maybe a conversation on IRC whenever everybody has a bit of time could help reaching a final consensus? Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 08/01/2016 01:35 PM, William A Rowe Jr wrote: > I do like your argument that we should do as little transformation > as possible, in order to facilitate moving CGI apps between > environments. Implementation differences are nasty for everyone. But > I'm not convinced that ship hasn't sailed; currently, it looks like > we modify outgoing CGI responses in order to merge headers, > normalize Content-Type, and produce Unmodified and Precondition > Failed responses. Merging headers is a well-defined behavior in the HTTP spec (see my earlier comments on RFC 3875). Three CGI header fields are CGI-specific. I don't know that you can correlate these mandated adjustments to date formats. I mean that we're merging CGI headers with other headers that have already been defined by the server (whatever those are); the things we've added would have to be replicated in other CGI environments anyway. That's probably one of my weaker arguments though, and I'm okay with dropping it. The precondition check behavior still stands as another major non-transparent treatment of a response. But there are outstanding bugs on that behavior and, again, I'm not sure it's strong enough to stand alone as an argument. > > If there is date input that we cannot handle, the > > spec strongly encourages us to interpret it as now(), provided > > we have a > > clock (which all of our architectures do.) > In the absence of a quote from the spec, I'm still in strong > disagreement with this, based on the language I quoted last week. The spec demands future -> now() if we know now(). On other erroneous data, that's up for debate. Agreed (and see below). > Moving on to Stefan's comments: > > If we see CGI as a kind of input that is not strictly regulated by > > HTTP header formats (and that is an if), we should correct timezone > > offset to GMT, but otherwise leave the time unchanged. It might > > be our clock that has the issue. Meddling with it will not help anyone > > debugging problems. > +1 (and I am currently of the opinion that CGI is not a strict HTTP > input, as stated above). Jacob and I have entirely different opinions on that, Stefan's comment is very interesting. We have a clock on nearly every deployment of httpd, but that doesn't mean it is correct. However the spec absolutely demands that we do not send our untrusted "Date:" with a future "Last-Modified:" value, it is externally inconsistent. Ah, right. My +1 to Stefan needed the exception that we still need to pull future values back; agreed. So this alternative is not my first choice. Invalid headers should really either be corrected (if the correction is obvious, safe, and helpful), or dropped entirely. Or the entire response should be 500'd, but we run into major compatibility breaks if we choose that route. No, I agree that a 500 is not an option. Drop it, or fix it loudly in the logs, but we can't break existing deployments (which don't accept non-GMT dates today, FWIW). Based on conversations with Luca, existing deployments do "accept" non-GMT dates, silently corrupting the value, due to the use of the *_parse_http() function. I'm happy to react to bad input following a parsable date string, e.g. any non-"GMT" signifier, as entirely bad input worth emitting to the error log. Let's help CGI authors find their bugs instead of generating unexpected results, such as 5-8 hour wrong "Last Modified" dates in the US and now() throughout Europe, owing to the time zone deltas. Cool. I'm not opposed to loudly dropping non-GMT timestamps (though I still prefer fixing them up). Any other opinions? I've talked enough I think. :) --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Mon, Aug 1, 2016 at 2:13 PM, Jacob Championwrote: > All right, getting back to this after a week off. I've tried to combine > feedback as best I can into one message. > > Bill, you wrote: > > I'm perfectly happy to translate such values to GMT for non-HTTP >> inputs, per spec. If we are going to do so for HTTP inputs, loudly >> scolding the errant developer in the logs seems prudent, for their own >> longer-term benefit. >> > > I suspect this is the core of the argument now. In my opinion, CGI is not > an HTTP input -- either as far as the spec is concerned (for spec lawyering > purposes) or in practice. It is a separate (HTTP-like) protocol, and an > implementation detail of the server. In other words, we are not "proxying" > to a CGI app; the CGI app is contained by the server and is providing > inputs to the server response. > > That is a separate argument from "is it wise to correct GMT timestamps", > though. > > 1. Why do you/reporter want HTTP applications to persist in writing code >> which breaks between different transport providers/cgi hosting >> environments? >> The language has been crystal clear for 2 decades. We do a huge disservice >> to the PHP author community to let them be idiots. Alternately, the PHP >> SAPI itself could rectify this. (We aren't talking about non-HTTP >> sources.) >> > > I'm not sure where PHP enters the conversation. They are only one (large > and important!) CGI producer; we're talking about our behavior with *all* > CGI applications here. > > I do like your argument that we should do as little transformation as > possible, in order to facilitate moving CGI apps between environments. > Implementation differences are nasty for everyone. But I'm not convinced > that ship hasn't sailed; currently, it looks like we modify outgoing CGI > responses in order to merge headers, normalize Content-Type, and produce > Unmodified and Precondition Failed responses. > Merging headers is a well-defined behavior in the HTTP spec (see my earlier comments on RFC 3875). Three CGI header fields are CGI-specific. I don't know that you can correlate these mandated adjustments to date formats. There may be others I have missed, but this doesn't look like the behavior > of a server that considers itself a transparent "passthrough" to a CGI > application. (Isn't that what CGI-NPH is for?) But! I could definitely be > swayed otherwise, if that's what we'd like to do moving forwards. I think > both sides have potential value, but we should choose one. Let's not even get into NPH scripts... heh. > If there is date input that we cannot handle, the >> spec strongly encourages us to interpret it as now(), provided we have a >> clock (which all of our architectures do.) >> > > In the absence of a quote from the spec, I'm still in strong disagreement > with this, based on the language I quoted last week. > The spec demands future -> now() if we know now(). On other erroneous data, that's up for debate. Moving on to Stefan's comments: > > If we see CGI as a kind of input that is not strictly regulated by >> HTTP header formats (and that is an if), we should correct timezone >> offset to GMT, but otherwise leave the time unchanged. It might be our >> clock that has the issue. Meddling with it will not help anyone >> debugging problems. >> > > +1 (and I am currently of the opinion that CGI is not a strict HTTP input, > as stated above). > Jacob and I have entirely different opinions on that, Stefan's comment is very interesting. We have a clock on nearly every deployment of httpd, but that doesn't mean it is correct. However the spec absolutely demands that we do not send our untrusted "Date:" with a future "Last-Modified:" value, it is externally inconsistent. > If the value is unparseable, we should log it and suppress sending >> outa "Last-Modified" completely. Also any "If-*" checking should >> behave as if the header was not present. >> > > +1. Sounds right. > The alternative is to expect the CGI to honor HTTP/1.1 header >> semantics, pass values unchanged and let CGI and client run into >> misunderstandings immediately. >> > > Practically, I'm not super opposed to this alternative (but if we choose > it, we should apply it consistently). If I put on spec-lawyer hat, the CGI > RFC has this to say: > > [https://tools.ietf.org/html/rfc3875#section-6.2.1] > >The server MUST make any appropriate modifications to the script's >>output to ensure that the response to the client complies with the >>response protocol version. >> > > So this alternative is not my first choice. Invalid headers should really > either be corrected (if the correction is obvious, safe, and helpful), or > dropped entirely. Or the entire response should be 500'd, but we run into > major compatibility breaks if we choose that route. > No, I agree that a 500 is not an option. Drop it, or fix it loudly in the logs, but we can't break existing deployments (which don't accept
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Mon, Aug 1, 2016 at 3:03 PM, Jacob Championwrote: > On 08/01/2016 12:38 PM, William A Rowe Jr wrote: > >> I'll review the rest of your comments shortly, but you might want to >> review >> https://tools.ietf.org/html/rfc3875 before claiming CGI isn't an HTTP >> input :-) >> > > I suspect we're talking past each other at this point. I'm aware of 3875 > (though I don't claim to be an expert), and I quoted it in my response. I > am using the term "HTTP input" in response to your much earlier statement > that > > (We aren't talking about non-HTTP sources.) >> > > We *are* talking about non-HTTP sources. CGI is *not* HTTP. It obviously > shares a great deal of syntax, but a CGI application is neither an HTTP > client nor an HTTP server, and it does not speak HTTP to our server. > Therefore CGI is not an "HTTP input" and we are not bound by 723x's > requirements for parsing date-stamps that originate from it -- which, IIUC, > was your argument. The wisdom of doing so (or not) is a completely separate > issue, outside the bounds of the spec. > We are bound by RFC 3875, which says... 6.3.4. Protocol-Specific Header Fields The script MAY return any other header fields that relate to the response message defined by the specification for the SERVER_PROTOCOL (HTTP/1.0 [1] or HTTP/1.1 [4]). The server MUST translate the header data from the CGI header syntax to the HTTP header syntax if these differ. For example, the character sequence for newline (such as UNIX's US-ASCII LF) used by CGI scripts may not be the same as that used by HTTP (US-ASCII CR followed by LF). E.g. no accommodation of non-HTTP input as header fields, other than to avoid hop-by-hop server <> client header fields, which httpd is required to dodge.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 08/01/2016 12:38 PM, William A Rowe Jr wrote: I'll review the rest of your comments shortly, but you might want to review https://tools.ietf.org/html/rfc3875 before claiming CGI isn't an HTTP input :-) I suspect we're talking past each other at this point. I'm aware of 3875 (though I don't claim to be an expert), and I quoted it in my response. I am using the term "HTTP input" in response to your much earlier statement that (We aren't talking about non-HTTP sources.) We *are* talking about non-HTTP sources. CGI is *not* HTTP. It obviously shares a great deal of syntax, but a CGI application is neither an HTTP client nor an HTTP server, and it does not speak HTTP to our server. Therefore CGI is not an "HTTP input" and we are not bound by 723x's requirements for parsing date-stamps that originate from it -- which, IIUC, was your argument. The wisdom of doing so (or not) is a completely separate issue, outside the bounds of the spec. If that is not your argument, then I've completely misunderstood, and we possibly agree with each other(?). Conforming can either refer to eliding a header line that doesn't conform, or munging a header line to a usable value, we could interpret that either way. Agreed. That's exactly my point; I'm not sure where the disagreement is coming from. --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Mon, Aug 1, 2016 at 2:13 PM, Jacob Championwrote: > All right, getting back to this after a week off. I've tried to combine > feedback as best I can into one message. > > Bill, you wrote: > > I'm perfectly happy to translate such values to GMT for non-HTTP >> inputs, per spec. If we are going to do so for HTTP inputs, loudly >> scolding the errant developer in the logs seems prudent, for their own >> longer-term benefit. >> > > I suspect this is the core of the argument now. In my opinion, CGI is not > an HTTP input -- either as far as the spec is concerned (for spec lawyering > purposes) or in practice. > >From CGI RFC 3875 sect 1.1; "The Common Gateway Interface (CGI) [22] allows an HTTP [1], [4] server and a CGI script to share responsibility for responding to client requests" It has no other basis of existence, although it could of course be applied to other existing or future protocols. Section 1.4 calls out specific spec terms which differ from their HTTP equivalents. To your argument, section 3.1 has this to say... "The server MUST perform translations and protocol conversions on the client request data required by this specification. Furthermore, the server retains its responsibility to the client to conform to the relevant network protocol even if the CGI script fails to conform to this specification." Conforming can either refer to eliding a header line that doesn't conform, or munging a header line to a usable value, we could interpret that either way. I'll review the rest of your comments shortly, but you might want to review https://tools.ietf.org/html/rfc3875 before claiming CGI isn't an HTTP input :-) Cheers, Bill
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
All right, getting back to this after a week off. I've tried to combine feedback as best I can into one message. Bill, you wrote: I'm perfectly happy to translate such values to GMT for non-HTTP inputs, per spec. If we are going to do so for HTTP inputs, loudly scolding the errant developer in the logs seems prudent, for their own longer-term benefit. I suspect this is the core of the argument now. In my opinion, CGI is not an HTTP input -- either as far as the spec is concerned (for spec lawyering purposes) or in practice. It is a separate (HTTP-like) protocol, and an implementation detail of the server. In other words, we are not "proxying" to a CGI app; the CGI app is contained by the server and is providing inputs to the server response. That is a separate argument from "is it wise to correct GMT timestamps", though. 1. Why do you/reporter want HTTP applications to persist in writing code which breaks between different transport providers/cgi hosting environments? The language has been crystal clear for 2 decades. We do a huge disservice to the PHP author community to let them be idiots. Alternately, the PHP SAPI itself could rectify this. (We aren't talking about non-HTTP sources.) I'm not sure where PHP enters the conversation. They are only one (large and important!) CGI producer; we're talking about our behavior with *all* CGI applications here. I do like your argument that we should do as little transformation as possible, in order to facilitate moving CGI apps between environments. Implementation differences are nasty for everyone. But I'm not convinced that ship hasn't sailed; currently, it looks like we modify outgoing CGI responses in order to merge headers, normalize Content-Type, and produce Unmodified and Precondition Failed responses. There may be others I have missed, but this doesn't look like the behavior of a server that considers itself a transparent "passthrough" to a CGI application. (Isn't that what CGI-NPH is for?) But! I could definitely be swayed otherwise, if that's what we'd like to do moving forwards. I think both sides have potential value, but we should choose one. If there is date input that we cannot handle, the spec strongly encourages us to interpret it as now(), provided we have a clock (which all of our architectures do.) In the absence of a quote from the spec, I'm still in strong disagreement with this, based on the language I quoted last week. Moving on to Stefan's comments: If we see CGI as a kind of input that is not strictly regulated by HTTP header formats (and that is an if), we should correct timezone offset to GMT, but otherwise leave the time unchanged. It might be our clock that has the issue. Meddling with it will not help anyone debugging problems. +1 (and I am currently of the opinion that CGI is not a strict HTTP input, as stated above). If the value is unparseable, we should log it and suppress sending outa "Last-Modified" completely. Also any "If-*" checking should behave as if the header was not present. +1. The alternative is to expect the CGI to honor HTTP/1.1 header semantics, pass values unchanged and let CGI and client run into misunderstandings immediately. Practically, I'm not super opposed to this alternative (but if we choose it, we should apply it consistently). If I put on spec-lawyer hat, the CGI RFC has this to say: [https://tools.ietf.org/html/rfc3875#section-6.2.1] The server MUST make any appropriate modifications to the script's output to ensure that the response to the client complies with the response protocol version. So this alternative is not my first choice. Invalid headers should really either be corrected (if the correction is obvious, safe, and helpful), or dropped entirely. Or the entire response should be 500'd, but we run into major compatibility breaks if we choose that route. And finally, from the latest patch from Luca: 2) Some comments have been added in the code to state clearly that anynon compliant datetime strings will not be interpreted or re-formatted. As stated above, this is not my first choice -- but I wouldn't oppose it if that's what the consensus comes to. else if (!ap_cstr_casecmp(w, "Last-Modified")) { -apr_time_t parsed_date = apr_date_parse_rfc(l); +apr_time_t parsed_date = apr_date_parse_http(l); apr_date_parse_http() is not good enough; IIUC, it completely ignores timezones, which further corrupts non-GMT Last-Modified stamps. We either want strict parsing or actual correction, not something in the middle. --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-29 10:30 GMT+02:00 Stefan Eissing: > > > Am 28.07.2016 um 18:50 schrieb William A Rowe Jr : > > > > On Thu, Jul 28, 2016 at 10:29 AM, Luca Toscano > wrote: > > > > I'd really like to have more opinions from other readers of the list.. > > > > ++1, we've presented our thoughts, it would be good to have others chime > in. > > If we see CGI as a kind of input that is not strictly regulated by HTTP > header formats (and that is an if), we should correct timezone offset to > GMT, but otherwise leave the time unchanged. It might be our clock that has > the issue. Meddling with it will not help anyone debugging problems. > > If the value is unparseable, we should log it and suppress sending out a > "Last-Modified" completely. Also any "If-*" checking should behave as if > the header was not present. > > The alternative is to expect the CGI to honor HTTP/1.1 header semantics, > pass values unchanged and let CGI and client run into misunderstandings > immediately. > > all my personal opinion. restrictions for backward compat in 2.2/2.4 apply. > > -Stefan Thanks a lot for the feedback Stefan! At this point it seems to me that the best way forward is to amend the current trunk's code with a patch like this one: http://apaste.info/uHT The change tries to satisfy the following feedbacks: 1) A trace1 log about Last-Modified header replace actions is present (not too much verbose). 2) Some comments have been added in the code to state clearly that any non compliant datetime strings will not be interpreted or re-formatted. 3) ap_update_mtime and ap_set_last_modified will be the only functions responsible to check/update the Last-Modified header as it happens now. This seems to me a bit more efficient and we'd avoid replicating the same checks in two different places. 4) Any parsed value ending up in APR_DATE_BAD will be dropped. The logging message about the header replacement will not be as clear as it is now, but the compromise looks good to me. Wording of comments and logging can be changed of course, let me know if the english used is not correct or clear enough. Hope that this patch will be good for everybody. Comments are very welcome and appreciated :) Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
> Am 28.07.2016 um 18:50 schrieb William A Rowe Jr: > > On Thu, Jul 28, 2016 at 10:29 AM, Luca Toscano wrote: > > I'd really like to have more opinions from other readers of the list.. > > ++1, we've presented our thoughts, it would be good to have others chime in. If we see CGI as a kind of input that is not strictly regulated by HTTP header formats (and that is an if), we should correct timezone offset to GMT, but otherwise leave the time unchanged. It might be our clock that has the issue. Meddling with it will not help anyone debugging problems. If the value is unparseable, we should log it and suppress sending out a "Last-Modified" completely. Also any "If-*" checking should behave as if the header was not present. The alternative is to expect the CGI to honor HTTP/1.1 header semantics, pass values unchanged and let CGI and client run into misunderstandings immediately. all my personal opinion. restrictions for backward compat in 2.2/2.4 apply. -Stefan
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Thu, Jul 28, 2016 at 10:29 AM, Luca Toscanowrote: > I'd really like to have more opinions from other readers of the list.. > ++1, we've presented our thoughts, it would be good to have others chime in.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-28 16:05 GMT+02:00 William A Rowe Jr: > On Thu, Jul 28, 2016 at 8:25 AM, Luca Toscano > wrote: > >> >> The first version of the change tried to solve an actual bug imho, namely >> returning Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT when the FCGI >> backend returned something like Last-Modified: bad-value-here (more >> precisely, trying to translate APR_DATE_BAD into a datetime value). >> > > We all agree that behavior was nonsense, thanks for this aspect of the > changes! > > That catch/fix can certainly be refreshed in the perl-framework already. > > I also agree 100% with Jacob and Yann that adding Last-Modified: now() for >> this use case is not great, we'd add the wrong semantic to an incorrect >> state. Then we tried to focus on how httpd interprets non GMT datetimes, >> discovering that in some cases httpd overrides were not trivial to >> understand. >> > > For example from the original report: >> >> PHP script execution time: 2 seconds >> >> FCGI/PHP header: >> Last-Modified: Fri, 24 Jun 2016 17:21:46 +0200 >> >> HTTPD header: >> Last-Modified: Fri, 24 Jun 2016 15:21:48 GMT (that was basically now() at >> the time) >> >> Httpd replaced the backend Last-Modified value (considered invalid) with >> now(), adding 2 extra seconds to the original value as consequence (httpd + >> PHP processing time) without any trace of the action in the logs. >> > > Yes, at least at loglevel trace, it is good to call out what httpd has > manipulated. > > >> I agree that in the perfect world people should read RFCs and write their >> application accordingly, but most of the times this does not happen and a >> bit of flexibility in a tool like httpd would make a lot of people happier >> :) >> As Yann mentioned it would be great to interpret the RFCs in a way that >> could benefit users giving added value to httpd, but if this is too much we >> can also agree to keep the existing behavior adding only logs for the >> users. The first option seems to be better from the users point of view, >> that's all. >> > > Are you sure? If their CGI/app is going to be thrown away soon, it is > certainly convenient. If their CGI/app will have a life beyond it's current > hosting environment, moved to another web server or cloud platform, we've > just delayed the inevitable of their learning of their mistake. > > I'm perfectly happy to translate such values to GMT for non-HTTP inputs, > per spec. If we are going to do so for HTTP inputs, loudly scolding the > errant developer in the logs seems prudent, for their own longer-term > benefit. > > This is a good point, even if this behavior could be easily get spotted with a bit of testing before serving production traffic on the new infrastructure/platform/etc.. :) As said before, I am ok with both approaches: either the current trunk patch or something less invasive like http://apaste.info/LiB. Yann/Jacob, would it be ok for you to reduce the scope of the patch or is it worth it to keep discussing alternative solutions? In the latter case I'd really like to have more opinions from other readers of the list.. Thanks! Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Thu, Jul 28, 2016 at 8:25 AM, Luca Toscanowrote: > > The first version of the change tried to solve an actual bug imho, namely > returning Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT when the FCGI > backend returned something like Last-Modified: bad-value-here (more > precisely, trying to translate APR_DATE_BAD into a datetime value). > We all agree that behavior was nonsense, thanks for this aspect of the changes! That catch/fix can certainly be refreshed in the perl-framework already. I also agree 100% with Jacob and Yann that adding Last-Modified: now() for > this use case is not great, we'd add the wrong semantic to an incorrect > state. Then we tried to focus on how httpd interprets non GMT datetimes, > discovering that in some cases httpd overrides were not trivial to > understand. > For example from the original report: > > PHP script execution time: 2 seconds > > FCGI/PHP header: > Last-Modified: Fri, 24 Jun 2016 17:21:46 +0200 > > HTTPD header: > Last-Modified: Fri, 24 Jun 2016 15:21:48 GMT (that was basically now() at > the time) > > Httpd replaced the backend Last-Modified value (considered invalid) with > now(), adding 2 extra seconds to the original value as consequence (httpd + > PHP processing time) without any trace of the action in the logs. > Yes, at least at loglevel trace, it is good to call out what httpd has manipulated. > I agree that in the perfect world people should read RFCs and write their > application accordingly, but most of the times this does not happen and a > bit of flexibility in a tool like httpd would make a lot of people happier > :) > As Yann mentioned it would be great to interpret the RFCs in a way that > could benefit users giving added value to httpd, but if this is too much we > can also agree to keep the existing behavior adding only logs for the > users. The first option seems to be better from the users point of view, > that's all. > Are you sure? If their CGI/app is going to be thrown away soon, it is certainly convenient. If their CGI/app will have a life beyond it's current hosting environment, moved to another web server or cloud platform, we've just delayed the inevitable of their learning of their mistake. I'm perfectly happy to translate such values to GMT for non-HTTP inputs, per spec. If we are going to do so for HTTP inputs, loudly scolding the errant developer in the logs seems prudent, for their own longer-term benefit.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-27 20:26 GMT+02:00 William A Rowe Jr: > On Wed, Jul 27, 2016 at 1:18 PM, Luca Toscano > wrote: > >> >> 2016-07-25 14:41 GMT+02:00 Yann Ylavic : >> >>> On Fri, Jul 22, 2016 at 8:42 PM, Jacob Champion >>> wrote: >>> > On 07/22/2016 10:49 AM, William A Rowe Jr wrote: >>> >> >>> >> I'm -1 for interpretating invalid values. >>> > >>> > >>> > By "invalid" do you mean any string that doesn't comply with 723x's >>> > Last-Modified definition? Even if the only non-compliance is the use >>> of a >>> > non-GMT timezone? >>> > >>> > I'm not personally a fan of all the strange date formats handled by the >>> > parse_rfc() function, but it seems like timezone translation is >>> something we >>> > can easily/usefully/safely do. >>> >>> +1 >>> >>> Bill, you previously quoted rfc7231#section-7.1.1.1's: >>> "Recipients of timestamp values are encouraged to be robust in parsing >>> timestamps unless otherwise restricted by the field definition. For >>> example, messages are occasionally forwarded over HTTP from a non-HTTP >>> source that might generate any of the date and time specifications >>> defined by the Internet Message Format. " >>> >>> I read this as being able to parse well known valid dates (GMT or not, >>> which is what parse_rfc() is for), provided we output rfc7231 >>> valid/future/GMT ones only. >>> Why couldn't we do so? >>> >>> > >>> >> The new behavior was wrong, it should be set to now() for all >>> >> invalidinput IMHO >>> > >>> > >>> > In my opinion, turning a completely invalid string (e.g. >>> "Last-Modified: >>> > blahblahblah") into a current Last-Modified stamp *is* interpretation. >>> We'd >>> > be attaching a Last-Modified value to a resource that doesn't really >>> deserve >>> > one. >>> >>> Again +1, either we are strict and error out on bad dates, or we are >>> lenient in what we get (to some extent) and do right thing (GMT) in >>> what we set. >>> >> >> Anybody else that would like to add his opinion? The more the better, I'd >> really like to reach a common agreement! >> > > Thanks for poking and moving this along :) I've stayed quite quiet to > allow > other voices to chime in... > > That still is the current RFC language, to be accomodating to non-HTTP > providers and correct the data. > > Two concerns remain... > > 1. Why do you/reporter want HTTP applications to persist in writing code > which breaks between different transport providers/cgi hosting > environments? > The language has been crystal clear for 2 decades. We do a huge disservice > to the PHP author community to let them be idiots. Alternately, the PHP > SAPI itself could rectify this. (We aren't talking about non-HTTP sources.) > The first version of the change tried to solve an actual bug imho, namely returning Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT when the FCGI backend returned something like Last-Modified: bad-value-here (more precisely, trying to translate APR_DATE_BAD into a datetime value). I also agree 100% with Jacob and Yann that adding Last-Modified: now() for this use case is not great, we'd add the wrong semantic to an incorrect state. Then we tried to focus on how httpd interprets non GMT datetimes, discovering that in some cases httpd overrides were not trivial to understand. For example from the original report: PHP script execution time: 2 seconds FCGI/PHP header: Last-Modified: Fri, 24 Jun 2016 17:21:46 +0200 HTTPD header: Last-Modified: Fri, 24 Jun 2016 15:21:48 GMT (that was basically now() at the time) Httpd replaced the backend Last-Modified value (considered invalid) with now(), adding 2 extra seconds to the original value as consequence (httpd + PHP processing time) without any trace of the action in the logs. I agree that in the perfect world people should read RFCs and write their application accordingly, but most of the times this does not happen and a bit of flexibility in a tool like httpd would make a lot of people happier :) As Yann mentioned it would be great to interpret the RFCs in a way that could benefit users giving added value to httpd, but if this is too much we can also agree to keep the existing behavior adding only logs for the users. The first option seems to be better from the users point of view, that's all. > 2. Precisely which unexpected values have broken the test suite? > I'm -1 to fix the suite until we understand the input data and the > old-vs-new > interpretation of that data. If there is date input that we cannot handle, > the > spec strongly encourages us to interpret it as now(), provided we have a > clock (which all of our architectures do.) > Maybe I am missing something but afaik nothing has broken the test suite. Jacob is trying to add more test cases and his patch adds a completely new test for this specific change to avoid any regressions in the future. Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Wed, Jul 27, 2016 at 1:18 PM, Luca Toscanowrote: > > 2016-07-25 14:41 GMT+02:00 Yann Ylavic : > >> On Fri, Jul 22, 2016 at 8:42 PM, Jacob Champion >> wrote: >> > On 07/22/2016 10:49 AM, William A Rowe Jr wrote: >> >> >> >> I'm -1 for interpretating invalid values. >> > >> > >> > By "invalid" do you mean any string that doesn't comply with 723x's >> > Last-Modified definition? Even if the only non-compliance is the use of >> a >> > non-GMT timezone? >> > >> > I'm not personally a fan of all the strange date formats handled by the >> > parse_rfc() function, but it seems like timezone translation is >> something we >> > can easily/usefully/safely do. >> >> +1 >> >> Bill, you previously quoted rfc7231#section-7.1.1.1's: >> "Recipients of timestamp values are encouraged to be robust in parsing >> timestamps unless otherwise restricted by the field definition. For >> example, messages are occasionally forwarded over HTTP from a non-HTTP >> source that might generate any of the date and time specifications >> defined by the Internet Message Format. " >> >> I read this as being able to parse well known valid dates (GMT or not, >> which is what parse_rfc() is for), provided we output rfc7231 >> valid/future/GMT ones only. >> Why couldn't we do so? >> >> > >> >> The new behavior was wrong, it should be set to now() for all >> >> invalidinput IMHO >> > >> > >> > In my opinion, turning a completely invalid string (e.g. "Last-Modified: >> > blahblahblah") into a current Last-Modified stamp *is* interpretation. >> We'd >> > be attaching a Last-Modified value to a resource that doesn't really >> deserve >> > one. >> >> Again +1, either we are strict and error out on bad dates, or we are >> lenient in what we get (to some extent) and do right thing (GMT) in >> what we set. >> > > Anybody else that would like to add his opinion? The more the better, I'd > really like to reach a common agreement! > Thanks for poking and moving this along :) I've stayed quite quiet to allow other voices to chime in... That still is the current RFC language, to be accomodating to non-HTTP providers and correct the data. Two concerns remain... 1. Why do you/reporter want HTTP applications to persist in writing code which breaks between different transport providers/cgi hosting environments? The language has been crystal clear for 2 decades. We do a huge disservice to the PHP author community to let them be idiots. Alternately, the PHP SAPI itself could rectify this. (We aren't talking about non-HTTP sources.) 2. Precisely which unexpected values have broken the test suite? I'm -1 to fix the suite until we understand the input data and the old-vs-new interpretation of that data. If there is date input that we cannot handle, the spec strongly encourages us to interpret it as now(), provided we have a clock (which all of our architectures do.)
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-25 14:41 GMT+02:00 Yann Ylavic: > On Fri, Jul 22, 2016 at 8:42 PM, Jacob Champion > wrote: > > On 07/22/2016 10:49 AM, William A Rowe Jr wrote: > >> > >> I'm -1 for interpretating invalid values. > > > > > > By "invalid" do you mean any string that doesn't comply with 723x's > > Last-Modified definition? Even if the only non-compliance is the use of a > > non-GMT timezone? > > > > I'm not personally a fan of all the strange date formats handled by the > > parse_rfc() function, but it seems like timezone translation is > something we > > can easily/usefully/safely do. > > +1 > > Bill, you previously quoted rfc7231#section-7.1.1.1's: > "Recipients of timestamp values are encouraged to be robust in parsing > timestamps unless otherwise restricted by the field definition. For > example, messages are occasionally forwarded over HTTP from a non-HTTP > source that might generate any of the date and time specifications > defined by the Internet Message Format. " > > I read this as being able to parse well known valid dates (GMT or not, > which is what parse_rfc() is for), provided we output rfc7231 > valid/future/GMT ones only. > Why couldn't we do so? > > > > >> The new behavior was wrong, it should be set to now() for all > >> invalidinput IMHO > > > > > > In my opinion, turning a completely invalid string (e.g. "Last-Modified: > > blahblahblah") into a current Last-Modified stamp *is* interpretation. > We'd > > be attaching a Last-Modified value to a resource that doesn't really > deserve > > one. > > Again +1, either we are strict and error out on bad dates, or we are > lenient in what we get (to some extent) and do right thing (GMT) in > what we set. > Anybody else that would like to add his opinion? The more the better, I'd really like to reach a common agreement! Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 22, 2016 at 8:42 PM, Jacob Championwrote: > On 07/22/2016 10:49 AM, William A Rowe Jr wrote: >> >> I'm -1 for interpretating invalid values. > > > By "invalid" do you mean any string that doesn't comply with 723x's > Last-Modified definition? Even if the only non-compliance is the use of a > non-GMT timezone? > > I'm not personally a fan of all the strange date formats handled by the > parse_rfc() function, but it seems like timezone translation is something we > can easily/usefully/safely do. +1 Bill, you previously quoted rfc7231#section-7.1.1.1's: "Recipients of timestamp values are encouraged to be robust in parsing timestamps unless otherwise restricted by the field definition. For example, messages are occasionally forwarded over HTTP from a non-HTTP source that might generate any of the date and time specifications defined by the Internet Message Format. " I read this as being able to parse well known valid dates (GMT or not, which is what parse_rfc() is for), provided we output rfc7231 valid/future/GMT ones only. Why couldn't we do so? > >> The new behavior was wrong, it should be set to now() for all >> invalidinput IMHO > > > In my opinion, turning a completely invalid string (e.g. "Last-Modified: > blahblahblah") into a current Last-Modified stamp *is* interpretation. We'd > be attaching a Last-Modified value to a resource that doesn't really deserve > one. Again +1, either we are strict and error out on bad dates, or we are lenient in what we get (to some extent) and do right thing (GMT) in what we set. Regards, Yann.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-23 1:18 GMT+02:00 Jacob Champion: > On 07/22/2016 01:38 PM, William A Rowe Jr wrote: > >> RFC 7231 § 7.1.1 >> RFC 7232 § 2.2 >> > > Okay, at least we're looking at the same sections then. But I'm not > finding support for your statement that we must replace completely > unintelligible Last-Modified values with current timestamps. The closest I > found was this: > >An origin server with a clock MUST NOT send a Last-Modified date that >>is later than the server's time of message origination (Date). If >>the last modification time is derived from implementation-specific >>metadata that evaluates to some time in the future, according to the >>origin server's clock, then the origin server MUST replace that value >>with the message origination date. >> > > which does not apply to completely invalid data, just future timestamp > metadata. A "Last-Modified: complete-junk" header coming from a CGI > implementation is not a "future" timestamp; it's complete junk, and IMO we > should not promote it to *any* authoritative value. We should treat it as > if no Last-Modified header was sent at all. > > Also, since we're talking about CGI here (which IIUC is an implementation > detail as far as HTTP is concerned), 7.1.1.1 has this to say: > > Note: HTTP requirements for the date/time stamp format apply only >> to their usage within the protocol stream. Implementations are >> not required to use these formats for user presentation, request >> logging, etc. >> > > CGI communication is not part of the protocol stream. As long as we don't > run afoul of any CGI-related RFCs that lock us into a specific > interpretation of junk header data, 723x appears (to me) to have no > prohibition on fixing up or removing bad HTTP-dates coming from an internal > CGI backend. It's up to us to determine what is most correct/useful > behavior. (The situation might be different if we were talking about a > cache or a proxy, but we're not.) > > Am I missing any relevant sections here? > > (Unfortunately I won't be able to continue my part of the conversation > next week since I'll be out of the office; hopefully others will put in > their two cents.) > > First of all thanks all for the good discussion, I am going to add my two cents. So I can see two relevant and separate changes: 1) Any invalid value like "Last-Modified: foo" (that generates a APR_DATE_BAD from APR) should be considered bogus and dropped, leaving a meaningful trace for the admin. This would be great in my opinion since httpd should be a "safe net" for honest mistakes made by backend developers that might accidentally introduce bogus headers like this one. I am not seeing any good use case that would legitimate turning a value like "foo" into a valid GMT datetime. 2) Interpreting header values not strictly allowed turning them into a valid GMT datetime value would be a good feature for httpd and something that our users will like (as stated for example by the person that brought up this issue on users@), but we need of course to consider the RFCs first. As Jacob pointed out I don't see any piece of the 723x RFCs stating clearly that the new proposed behavior is wrong, but I also value a lot William's experience in dealing with RFCs so it might be good to re-think the patch to something less intrusive like http://apaste.info/LiB. This patch would be a compromise between my initial proposal and Yann's. Example: PHP script returning Last-Modified: Sat, 23 Jul 2016 10:00:27 +0200 GMT now: Sat, 23 Jul 2016 08:00:27 GMT Header returned by httpd: Last-Modified: Sat, 23 Jul 2016 08:00:27 GMT Logs: util_script.c(564): [client ::1:44380] Last-Modified: Sat, 23 Jul 2016 10:00:27 +0200 util_script.c(683): [client ::1:44380] The Last-Modified header value 'Sat, 23 Jul 2016 10:00:27 +0200' (in the future considering the GMT timezone) has been replaced with 'Sat, 23 Jul 2016 08:00:27 GMT' http_filters.c(835): [client ::1:44380] Last-Modified: Sat, 23 Jul 2016 08:00:27 GMT The wording could be different of course, I am a non native english speaker so please let me know if there is anything better than these messages. Caveat: any httpd processing time would be added to the final Last-Modified value, something that confused a lot the person that brought up this issue on users@ (original repro case from users@: https://github.com/vaceletm/bug-httpd24/blob/time/headers.php). I am happy with any solution that we choose, as long as we reach a final and shared agreement without getting stuck for too long (with the risk of adding dust and dropping this change completely). Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 07/22/2016 01:38 PM, William A Rowe Jr wrote: RFC 7231 § 7.1.1 RFC 7232 § 2.2 Okay, at least we're looking at the same sections then. But I'm not finding support for your statement that we must replace completely unintelligible Last-Modified values with current timestamps. The closest I found was this: An origin server with a clock MUST NOT send a Last-Modified date that is later than the server's time of message origination (Date). If the last modification time is derived from implementation-specific metadata that evaluates to some time in the future, according to the origin server's clock, then the origin server MUST replace that value with the message origination date. which does not apply to completely invalid data, just future timestamp metadata. A "Last-Modified: complete-junk" header coming from a CGI implementation is not a "future" timestamp; it's complete junk, and IMO we should not promote it to *any* authoritative value. We should treat it as if no Last-Modified header was sent at all. Also, since we're talking about CGI here (which IIUC is an implementation detail as far as HTTP is concerned), 7.1.1.1 has this to say: Note: HTTP requirements for the date/time stamp format apply only to their usage within the protocol stream. Implementations are not required to use these formats for user presentation, request logging, etc. CGI communication is not part of the protocol stream. As long as we don't run afoul of any CGI-related RFCs that lock us into a specific interpretation of junk header data, 723x appears (to me) to have no prohibition on fixing up or removing bad HTTP-dates coming from an internal CGI backend. It's up to us to determine what is most correct/useful behavior. (The situation might be different if we were talking about a cache or a proxy, but we're not.) Am I missing any relevant sections here? (Unfortunately I won't be able to continue my part of the conversation next week since I'll be out of the office; hopefully others will put in their two cents.) --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
RFC 7231 § 7.1.1 RFC 7232 § 2.2 On Jul 22, 2016 15:01, "Jacob Champion"wrote: > On 07/22/2016 12:30 PM, William A Rowe Jr wrote: > >> Yes, I mean anything that doesn't fit one of the *three* allowable >> formats. >> Nothing is allowed except for GMT. >> > > Agreed, only GMT is allowed on the wire. I still believe it's potentially > useful, and not unsafe, to transform a non-GMT timestamp into a valid GMT > timestamp before putting it on the wire. > > (I'm also not opposed to removing such a header entirely; I just don't > think that's as useful.) > > In my opinion, turning a completely invalid string (e.g. >> "Last-Modified: blahblahblah") into a current Last-Modified stamp >> *is* interpretation. We'd be attaching a Last-Modified value to a >> resource that doesn't really deserve one. >> >> That is precisely what the spec calls us to do. >> > > Can you provide a section? I don't know 723x to the extent I knew 2616 yet. > > --Jacob >
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 07/22/2016 12:30 PM, William A Rowe Jr wrote: Yes, I mean anything that doesn't fit one of the *three* allowable formats. Nothing is allowed except for GMT. Agreed, only GMT is allowed on the wire. I still believe it's potentially useful, and not unsafe, to transform a non-GMT timestamp into a valid GMT timestamp before putting it on the wire. (I'm also not opposed to removing such a header entirely; I just don't think that's as useful.) In my opinion, turning a completely invalid string (e.g. "Last-Modified: blahblahblah") into a current Last-Modified stamp *is* interpretation. We'd be attaching a Last-Modified value to a resource that doesn't really deserve one. That is precisely what the spec calls us to do. Can you provide a section? I don't know 723x to the extent I knew 2616 yet. --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 22, 2016 at 1:42 PM, Jacob Championwrote: > On 07/22/2016 10:49 AM, William A Rowe Jr wrote: > >> I'm -1 for interpretating invalid values. >> > > By "invalid" do you mean any string that doesn't comply with 723x's > Last-Modified definition? Even if the only non-compliance is the use of a > non-GMT timezone? > > I'm not personally a fan of all the strange date formats handled by the > parse_rfc() function, but it seems like timezone translation is something > we can easily/usefully/safely do. > Yes, I mean anything that doesn't fit one of the *three* allowable formats. Nothing is allowed except for GMT. > The new behavior was wrong, it should be set to now() for all >> invalidinput IMHO >> > > In my opinion, turning a completely invalid string (e.g. "Last-Modified: > blahblahblah") into a current Last-Modified stamp *is* interpretation. We'd > be attaching a Last-Modified value to a resource that doesn't really > deserve one. That is precisely what the spec calls us to do.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 07/22/2016 10:49 AM, William A Rowe Jr wrote: I'm -1 for interpretating invalid values. By "invalid" do you mean any string that doesn't comply with 723x's Last-Modified definition? Even if the only non-compliance is the use of a non-GMT timezone? I'm not personally a fan of all the strange date formats handled by the parse_rfc() function, but it seems like timezone translation is something we can easily/usefully/safely do. The new behavior was wrong, it should be set to now() for all invalidinput IMHO In my opinion, turning a completely invalid string (e.g. "Last-Modified: blahblahblah") into a current Last-Modified stamp *is* interpretation. We'd be attaching a Last-Modified value to a resource that doesn't really deserve one. --Jacob
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
I'm -1 for interpretating invalid values. But +1 for alerting the admin of the invalid script/module/CGI. The new behavior was wrong, it should be set to now() for all invalid input IMHO On Jul 21, 2016 5:20 PM, "Jacob Champion"wrote: > On 07/03/2016 02:56 AM, Luca Toscano wrote: > >> Patch committed to trunk in http://svn.apache.org/r1751138 >> Updated backport proposal and trunk's CHANGES. >> > > Attached is an httpd-test patch with some regression cases for this > change. (I'm not sure what the review/commit policies are for that repo; > can anyone enlighten me?) > > Note that apr_date_parse_rfc() handles many more strange and semi-broken > cases than just RFC 822/1123. I am +1 for those but wanted to bring it up > to everyone's attention, just in case. > > --Jacob >
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On 07/03/2016 02:56 AM, Luca Toscano wrote: Patch committed to trunk in http://svn.apache.org/r1751138 Updated backport proposal and trunk's CHANGES. Attached is an httpd-test patch with some regression cases for this change. (I'm not sure what the review/commit policies are for that repo; can anyone enlighten me?) Note that apr_date_parse_rfc() handles many more strange and semi-broken cases than just RFC 822/1123. I am +1 for those but wanted to bring it up to everyone's attention, just in case. --Jacob >From 298f1b53e1da2d5e933f1a69332bbce436c0f482 Mon Sep 17 00:00:00 2001 From: Jacob ChampionDate: Thu, 21 Jul 2016 14:56:55 -0700 Subject: [PATCH] Add Last-Modified header replacement tests Per the long conversation culminating in https://lists.apache.org/thread.html/909bb1b18b723ed0127b062d0f4d18779d2c7a2acddcbd57c7021f51@%3Cdev.httpd.apache.org%3E test that httpd's CGI implementation does the following: - replaces future Last-Modified dates with the current date/time - transforms RFC 822/1123 Last-Modified dates into RFC 723x (GMT) - completely removes unintelligible Last-Modified headers --- t/htdocs/modules/cgi/last-modified.pl.PL | 15 +++ t/modules/cgi.t | 26 +- 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 t/htdocs/modules/cgi/last-modified.pl.PL diff --git a/t/htdocs/modules/cgi/last-modified.pl.PL b/t/htdocs/modules/cgi/last-modified.pl.PL new file mode 100644 index 000..898a3f4 --- /dev/null +++ b/t/htdocs/modules/cgi/last-modified.pl.PL @@ -0,0 +1,15 @@ +use strict; + +sub unescape($); + +my $modified = unescape($ENV{QUERY_STRING}); + +print "Content-Type: text/plain\n"; +print "Last-Modified: $modified\n\n"; + +# Naive unescape -- only cares about %20 (space). +sub unescape($) { +my $str = shift(@_); +$str =~ s/%20/ /g; +return $str; +} diff --git a/t/modules/cgi.t b/t/modules/cgi.t index d191d8d..a7221e5 100644 --- a/t/modules/cgi.t +++ b/t/modules/cgi.t @@ -115,7 +115,7 @@ if (!$have_apache_2050 || Apache::TestConfig::WINFU()) { delete @test{qw(stderr1.pl stderr2.pl stderr3.pl nph-stderr.pl)}; } -my $tests = ((keys %test) * 2) + (@post_content * 3) + 4; +my $tests = ((keys %test) * 2) + (@post_content * 3) + 9; plan tests => $tests, \_cgi; my ($expected, $actual); @@ -284,5 +284,29 @@ if (-e $cgi_log) { print "# checking that HEAD $path/perl.pl returns 200.\n"; ok HEAD_RC("$path/perl.pl") == 200; +## Regression tests for Last-Modified header replacement. +my $r; + +$r = GET("$path/last-modified.pl?Sat, 12 Jun 1982 22:12:56 GMT"); +ok t_cmp($r->header('Last-Modified'), 'Sat, 12 Jun 1982 22:12:56 GMT', + 'valid Last-Modified values are passed along'); + +$r = GET("$path/last-modified.pl?invalid"); +ok t_cmp($r->header('Last-Modified'), undef, + 'invalid Last-Modified values are suppressed'); + +$r = GET("$path/last-modified.pl?Sat, 12 Jun 82 22:12:56 -0700"); +ok t_cmp($r->header('Last-Modified'), 'Sun, 13 Jun 1982 05:12:56 GMT', + 'RFC 822 Last-Modified values are rewritten'); + +$r = GET("$path/last-modified.pl?Sat, 12 Jun 1982 22:12:56 -0700"); +ok t_cmp($r->header('Last-Modified'), 'Sun, 13 Jun 1982 05:12:56 GMT', + 'RFC 1123 Last-Modified values are rewritten'); + +$r = GET("$path/last-modified.pl?Tue, 25 Dec 03:14:15 GMT"); +ok t_cmp($r->header('Last-Modified'), $r->header('Client-Date'), + 'future Last-Modified values are replaced'); + + ## clean up unlink $cgi_log; -- 2.7.4
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-02 18:27 GMT+02:00 Yann Ylavic: > On Sat, Jul 2, 2016 at 4:39 PM, William A Rowe Jr > wrote: > > Relevant data points... > > > > https://tools.ietf.org/html/rfc7231#section-7.1.1.1 > > > > There is no other supported time zone except GMT representing GMT. That > is > > the only value we may send. > > > > "Recipients of timestamp values are encouraged to be robust in parsing > > timestamps unless otherwise restricted by the field definition. For > example, > > messages are occasionally forwarded over HTTP from a non-HTTP source that > > might generate any of the date and time specifications defined by the > > Internet Message Format. " > > > > We may accept an unusual timezone but should reformat as GMT in the > > response. > > Great, that avoids double-parsing the date (but for TRACE1), so the > patch (attached) is no more introducing more cycles. > Elengant and simple, I like it! For some reason I thought that the "l" variable (containing the LM header sent from FCGI) was lost in the update process, but I was clearly wrong. Next time I'll try harder before committing! Tested with a simple PHP script returning different Last-Modified header values, sleeping two seconds before returning anything (this is useful to check if httpd modifies the original date/time with now): 1) tomorrow - (Last-Modified sent: Mon, 04 Jul 2016 00:00:00 +) [Sun Jul 03 09:29:11.510944 2016] [proxy_fcgi:trace4] [pid 16253:tid 140672292075264] util_script.c(564): [client ::1:52295] Last-Modified: Mon, 04 Jul 2016 00:00:00 + [Sun Jul 03 09:29:11.510955 2016] [proxy_fcgi:trace1] [pid 16253:tid 140672292075264] util_script.c(686): [client ::1:52295] The Last-Modified header value 'Mon, 04 Jul 2016 00:00:00 +' (in the future) has been replaced with 'Sun, 03 Jul 2016 09:29:11 GMT' [Sun Jul 03 09:29:11.510998 2016] [http:trace4] [pid 16253:tid 140672292075264] http_filters.c(850): [client ::1:52295] Last-Modified: Sun, 03 Jul 2016 09:29:11 GMT 2) now in Europe/Paris timezone (Last-Modified sent: Sun, 03 Jul 2016 11:30:10 +0200) [Sun Jul 03 09:30:12.133260 2016] [proxy_fcgi:trace4] [pid 16253:tid 140672275289856] util_script.c(564): [client ::1:52297] Last-Modified: Sun, 03 Jul 2016 11:30:10 +0200 [Sun Jul 03 09:30:12.133272 2016] [proxy_fcgi:trace1] [pid 16253:tid 140672275289856] util_script.c(686): [client ::1:52297] The Last-Modified header value 'Sun, 03 Jul 2016 11:30:10 +0200' (non GMT) has been replaced with 'Sun, 03 Jul 2016 09:30:10 GMT' [Sun Jul 03 09:30:12.133374 2016] [http:trace4] [pid 16253:tid 140672275289856] http_filters.c(850): [client ::1:52297] Last-Modified: Sun, 03 Jul 2016 09:30:10 GMT 3) now in GMT (Last-Modified sent: Sun, 03 Jul 2016 09:31:57 +) [Sun Jul 03 09:31:59.631573 2016] [proxy_fcgi:trace4] [pid 16253:tid 140672250111744] util_script.c(564): [client ::1:52301] Last-Modified: Sun, 03 Jul 2016 09:31:57 + [Sun Jul 03 09:31:59.631635 2016] [http:trace4] [pid 16253:tid 140672250111744] http_filters.c(850): [client ::1:52301] Last-Modified: Sun, 03 Jul 2016 09:31:57 GMT Note that in case 2) httpd does not set now() but uses the original date value sent by FCGI modified to GMT. Patch committed to trunk in http://svn.apache.org/r1751138 Updated backport proposal and trunk's CHANGES. Thanks a lot Yann and William! Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Sat, Jul 2, 2016 at 4:39 PM, William A Rowe Jrwrote: > Relevant data points... > > https://tools.ietf.org/html/rfc7231#section-7.1.1.1 > > There is no other supported time zone except GMT representing GMT. That is > the only value we may send. > > "Recipients of timestamp values are encouraged to be robust in parsing > timestamps unless otherwise restricted by the field definition. For example, > messages are occasionally forwarded over HTTP from a non-HTTP source that > might generate any of the date and time specifications defined by the > Internet Message Format. " > > We may accept an unusual timezone but should reformat as GMT in the > response. Great, that avoids double-parsing the date (but for TRACE1), so the patch (attached) is no more introducing more cycles. Regards, Yann. Index: server/util_script.c === --- server/util_script.c (revision 1750971) +++ server/util_script.c (working copy) @@ -665,28 +665,25 @@ AP_DECLARE(int) ap_scan_script_header_err_core_ex( * pass it on blindly because of restrictions on future or invalid values. */ else if (!ap_cstr_casecmp(w, "Last-Modified")) { -apr_time_t last_modified_date = apr_date_parse_http(l); -if (last_modified_date != APR_DATE_BAD) { +apr_time_t parsed_date = apr_date_parse_rfc(l); +if (parsed_date != APR_DATE_BAD) { +apr_time_t last_modified_date = parsed_date; +apr_time_t now = apr_time_now(); +if (parsed_date > now) { +last_modified_date = now; +} ap_update_mtime(r, last_modified_date); ap_set_last_modified(r); -if (APLOGrtrace1(r)) { -const char* datestr = apr_table_get(r->headers_out, -"Last-Modified"); -apr_time_t timestamp = apr_date_parse_http(datestr); -if (timestamp < last_modified_date) { -char *last_modified_datestr = apr_palloc(r->pool, - APR_RFC822_DATE_LEN); -apr_rfc822_date(last_modified_datestr, last_modified_date); -ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, r, - "The Last-Modified header value '%s' " - "(parsed as RFC822/RFC1123 datetime, " - "that assumes the GMT timezone) " - "has been replaced with: '%s'. " - "An origin server with a clock must not send " - "a Last-Modified date that is later than the " - "server's time of message origination.", - last_modified_datestr, datestr); -} +if (APLOGrtrace1(r) && +(parsed_date > now || + parsed_date != apr_date_parse_http(l))) { +ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, r, + "The Last-Modified header value '%s' (%s) " + "has been replaced with '%s'", l, + parsed_date > now ? "in the future" +: "non GMT", + apr_table_get(r->headers_out, +"Last-Modified")); } } else {
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
Relevant data points... https://tools.ietf.org/html/rfc7231#section-7.1.1.1 There is no other supported time zone except GMT representing GMT. That is the only value we may send. "Recipients of timestamp values are encouraged to be robust in parsing timestamps unless otherwise restricted by the field definition. For example, messages are occasionally forwarded over HTTP from a non-HTTP source that might generate any of the date and time specifications defined by the Internet Message Format. " We may accept an unusual timezone but should reformat as GMT in the response. On Jul 2, 2016 6:52 AM, "Yann Ylavic"wrote: > On Sat, Jul 2, 2016 at 2:05 AM, Luca Toscano > wrote: > > > > We have discussed it briefly in another email but didn't reach a > conclusion, > > so I am really happy to re-discuss it again. Maybe an example would > clarify > > what a user will see in the logs. > > How about (modulo quick, dirty and not even compile tested errors): > > Index: server/util_script.c > === > --- server/util_script.c(revision 1750971) > +++ server/util_script.c(working copy) > @@ -665,29 +665,28 @@ AP_DECLARE(int) ap_scan_script_header_err_core_ex( > * pass it on blindly because of restrictions on future or > invalid values. > */ > else if (!ap_cstr_casecmp(w, "Last-Modified")) { > -apr_time_t last_modified_date = apr_date_parse_http(l); > -if (last_modified_date != APR_DATE_BAD) { > -ap_update_mtime(r, last_modified_date); > -ap_set_last_modified(r); > -if (APLOGrtrace1(r)) { > -const char* datestr = apr_table_get(r->headers_out, > -"Last-Modified"); > -apr_time_t timestamp = apr_date_parse_http(datestr); > -if (timestamp < last_modified_date) { > -char *last_modified_datestr = apr_palloc(r->pool, > - > APR_RFC822_DATE_LEN); > -apr_rfc822_date(last_modified_datestr, > last_modified_date); > -ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, r, > - "The Last-Modified header value > '%s' " > - "(parsed as RFC822/RFC1123 > datetime, " > - "that assumes the GMT timezone) " > - "has been replaced with: '%s'. " > - "An origin server with a clock > must not send " > - "a Last-Modified date that is > later than the " > - "server's time of message > origination.", > - last_modified_datestr, datestr); > +apr_time_t last_modified_date_rfc = apr_date_parse_rfc(l); > +if (last_modified_date_rfc != APR_DATE_BAD) { > +apr_time_t now = apr_time_now(), > + last_modified_date_gmt = > last_modified_date_rfc; > +if (last_modified_date_rfc != apr_date_parse_http(l)) { > +if (be_strict) { > +/* log an error about non-GMT Last-Modified */ > +return HTTP_BAD_GATEWAY; > } > +if (last_modified_date_rfc <= now) { > +/* be liberal, log a trace1 about forcing gmt > format, > + * i.e. l -> > apr_rfc822_date(last_modified_date_rfc), > + * and then fall through > + */ > +} > } > +if (last_modified_date_rfc > now) { > +/* log about Last-Modified in the future > (debug/info?) */ > +last_modified_date_gmt = now; > +} > +ap_update_mtime(r, last_modified_date_gmt); > +ap_set_last_modified(r); > } > else { > if (APLOGrtrace1(r)) > _ > > Now we use apr_date_parse_rfc(), instead of apr_date_parse_http(), and > hence parse the GMT offset such that we can detect whether the > original Last-Modified is GMT or not. > > If not and we have to be strict return BAD_GATEWAY, otherwise > (lenient) rewrite the Last-Modified header to its GMT form. > > We can also explicitely check if it is in the future (from httpd POV), > and force it to 'now' in this case (if that's the correct behaviour). > > > > > FCGI script returning the following header (Europe/Paris timezone): > > > > Last-Modified: Sat, 02 Jul 2016 01:49:27 +0200 > > > > The current proposed logging (givent a proper LogLevel setting) would be: > > > > [Fri Jul 01 23:49:29.173823 2016] [proxy_fcgi:trace4] [pid 3542:tid > > 140560966862592]
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Sat, Jul 2, 2016 at 2:05 AM, Luca Toscanowrote: > > We have discussed it briefly in another email but didn't reach a conclusion, > so I am really happy to re-discuss it again. Maybe an example would clarify > what a user will see in the logs. How about (modulo quick, dirty and not even compile tested errors): Index: server/util_script.c === --- server/util_script.c(revision 1750971) +++ server/util_script.c(working copy) @@ -665,29 +665,28 @@ AP_DECLARE(int) ap_scan_script_header_err_core_ex( * pass it on blindly because of restrictions on future or invalid values. */ else if (!ap_cstr_casecmp(w, "Last-Modified")) { -apr_time_t last_modified_date = apr_date_parse_http(l); -if (last_modified_date != APR_DATE_BAD) { -ap_update_mtime(r, last_modified_date); -ap_set_last_modified(r); -if (APLOGrtrace1(r)) { -const char* datestr = apr_table_get(r->headers_out, -"Last-Modified"); -apr_time_t timestamp = apr_date_parse_http(datestr); -if (timestamp < last_modified_date) { -char *last_modified_datestr = apr_palloc(r->pool, - APR_RFC822_DATE_LEN); -apr_rfc822_date(last_modified_datestr, last_modified_date); -ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, r, - "The Last-Modified header value '%s' " - "(parsed as RFC822/RFC1123 datetime, " - "that assumes the GMT timezone) " - "has been replaced with: '%s'. " - "An origin server with a clock must not send " - "a Last-Modified date that is later than the " - "server's time of message origination.", - last_modified_datestr, datestr); +apr_time_t last_modified_date_rfc = apr_date_parse_rfc(l); +if (last_modified_date_rfc != APR_DATE_BAD) { +apr_time_t now = apr_time_now(), + last_modified_date_gmt = last_modified_date_rfc; +if (last_modified_date_rfc != apr_date_parse_http(l)) { +if (be_strict) { +/* log an error about non-GMT Last-Modified */ +return HTTP_BAD_GATEWAY; } +if (last_modified_date_rfc <= now) { +/* be liberal, log a trace1 about forcing gmt format, + * i.e. l -> apr_rfc822_date(last_modified_date_rfc), + * and then fall through + */ +} } +if (last_modified_date_rfc > now) { +/* log about Last-Modified in the future (debug/info?) */ +last_modified_date_gmt = now; +} +ap_update_mtime(r, last_modified_date_gmt); +ap_set_last_modified(r); } else { if (APLOGrtrace1(r)) _ Now we use apr_date_parse_rfc(), instead of apr_date_parse_http(), and hence parse the GMT offset such that we can detect whether the original Last-Modified is GMT or not. If not and we have to be strict return BAD_GATEWAY, otherwise (lenient) rewrite the Last-Modified header to its GMT form. We can also explicitely check if it is in the future (from httpd POV), and force it to 'now' in this case (if that's the correct behaviour). > > FCGI script returning the following header (Europe/Paris timezone): > > Last-Modified: Sat, 02 Jul 2016 01:49:27 +0200 > > The current proposed logging (givent a proper LogLevel setting) would be: > > [Fri Jul 01 23:49:29.173823 2016] [proxy_fcgi:trace4] [pid 3542:tid > 140560966862592] util_script.c(564): [client ::1:52263] Last-Modified: > Sat, 02 Jul 2016 01:49:27 +0200 > > [Fri Jul 01 23:49:29.173833 2016] [proxy_fcgi:trace1] [pid 3542:tid > 140560966862592] util_script.c(688): [client ::1:52263] The Last-Modified > header value 'Sat, 02 Jul 2016 01:49:27 GMT' (parsed as RFC882/RFC1123 > datetime, that assumes the GMT timezone) has been replaced with: 'Fri, 01 > Jul 2016 23:49:29 GMT'. An origin server with a clock must not send a > Last-Modified date that is later than the server's time of message > origination. The issue with the original code is that it does not really detect if the Last-Modified header is in the future since it ignores the timezone/offset. The RFC does not tell us either what we should do if a timezone different than GMT is specified (I read it as: if there is no timezone assume GMT, but here
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
2016-07-01 22:31 GMT+02:00 Yann Ylavic: > On Fri, Jul 1, 2016 at 10:17 PM, William A Rowe Jr > wrote: > > On Fri, Jul 1, 2016 at 2:58 PM, Yann Ylavic > wrote: > >> > >> On Fri, Jul 1, 2016 at 6:32 PM, Luca Toscano > >> wrote: > >> > > >> > "The Last-Modified header value '%s' (parsed assuming the GMT > timezone) > >> > has > >> > been replaced with '%s' because considered in the future." > >> > >> Looks good to me (maybe "(GMT)" only between parentheses?). > >> > >> The original log message can still be switched to a comment, though ;) > > > > > > I'm not fond of the 'assuming' thing, per RFC2616 it *is* defined as GMT. > > > > (parsed as GMT, as required) > > > > might be a way to phrase that? Other words that came to mind were > > 'as defined', 'per spec', etc. > > > > Showing a value 'datetime (CEST)' (GMT) is unnecessarily confusing. > +1, really like the "as required", it is more readable and meaningful. > > Hmm, isn't "(CEST)" if there recognized by the parser (so to "correct" > the compared epoch)? > If so, this looks more like a bad "Last-Modified" than a future one > (even if it's the case). > (Sorry I didn't follow the discussion about this issue). > > Anyway, if we can find a timezone string in the header, "(GMT)" alone > may be indeed confusing, but so is "parsed as GMT" IMHO. > > PS: if that has been discussed already, feel free to ignore me, I'll > go looking at the thread :) > We have discussed it briefly in another email but didn't reach a conclusion, so I am really happy to re-discuss it again. Maybe an example would clarify what a user will see in the logs. FCGI script returning the following header (Europe/Paris timezone): Last-Modified: Sat, 02 Jul 2016 01:49:27 +0200 The current proposed logging (givent a proper LogLevel setting) would be: [Fri Jul 01 23:49:29.173823 2016] [proxy_fcgi:trace4] [pid 3542:tid 140560966862592] util_script.c(564): [client ::1:52263] Last-Modified: Sat, 02 Jul 2016 01:49:27 +0200 [Fri Jul 01 23:49:29.173833 2016] [proxy_fcgi:trace1] [pid 3542:tid 140560966862592] util_script.c(688): [client ::1:52263] The Last-Modified header value 'Sat, 02 Jul 2016 01:49:27 GMT' (parsed as RFC882/RFC1123 datetime, that assumes the GMT timezone) has been replaced with: 'Fri, 01 Jul 2016 23:49:29 GMT'. An origin server with a clock must not send a Last-Modified date that is later than the server's time of message origination. [Fri Jul 01 23:49:29.173876 2016] [http:trace4] [pid 3542:tid 140560966862592] http_filters.c(835): [client ::1:52263] Last-Modified: Fri, 01 Jul 2016 23:49:29 GMT Notice that the second log (the one that I added) shows both dates in GMT, no mention of +0200. I didn't find a way to log the original value in the code section that I changed (probably due to my inexperience), so I relied on the fact that there is a complete dump of the FCGI headers before it. This is why I added "parsed as RFC882/RFC1123 datetime, that assumes the GMT timezone". I also got some feedback from users@ that the log flow was clear so I proceeded with the commit. As William wrote in the other thread, https://tools.ietf.org/html/rfc2616#section-3.3.1 states that the GMT timezone must be assumed by a datestr parser, and this is exactly what apr_date_parse_http seems to be doing. "The Last-Modified header value '%s' (parsed as GMT date as required) has been replaced with '%s' because considered in the future." could be a possible solution. If this is still confusing, we could remove the "parsed as .." bit. Thanks! Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 1, 2016 at 10:17 PM, William A Rowe Jrwrote: > On Fri, Jul 1, 2016 at 2:58 PM, Yann Ylavic wrote: >> >> On Fri, Jul 1, 2016 at 6:32 PM, Luca Toscano >> wrote: >> > >> > "The Last-Modified header value '%s' (parsed assuming the GMT timezone) >> > has >> > been replaced with '%s' because considered in the future." >> >> Looks good to me (maybe "(GMT)" only between parentheses?). >> >> The original log message can still be switched to a comment, though ;) > > > I'm not fond of the 'assuming' thing, per RFC2616 it *is* defined as GMT. > > (parsed as GMT, as required) > > might be a way to phrase that? Other words that came to mind were > 'as defined', 'per spec', etc. > > Showing a value 'datetime (CEST)' (GMT) is unnecessarily confusing. Hmm, isn't "(CEST)" if there recognized by the parser (so to "correct" the compared epoch)? If so, this looks more like a bad "Last-Modified" than a future one (even if it's the case). (Sorry I didn't follow the discussion about this issue). Anyway, if we can find a timezone string in the header, "(GMT)" alone may be indeed confusing, but so is "parsed as GMT" IMHO. PS: if that has been discussed already, feel free to ignore me, I'll go looking at the thread :)
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 1, 2016 at 2:58 PM, Yann Ylavicwrote: > On Fri, Jul 1, 2016 at 6:32 PM, Luca Toscano > wrote: > > > > "The Last-Modified header value '%s' (parsed assuming the GMT timezone) > has > > been replaced with '%s' because considered in the future." > > Looks good to me (maybe "(GMT)" only between parentheses?). > > The original log message can still be switched to a comment, though ;) > I'm not fond of the 'assuming' thing, per RFC2616 it *is* defined as GMT. (parsed as GMT, as required) might be a way to phrase that? Other words that came to mind were 'as defined', 'per spec', etc. Showing a value 'datetime (CEST)' (GMT) is unnecessarily confusing.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 1, 2016 at 6:32 PM, Luca Toscanowrote: > > "The Last-Modified header value '%s' (parsed assuming the GMT timezone) has > been replaced with '%s' because considered in the future." Looks good to me (maybe "(GMT)" only between parentheses?). The original log message can still be switched to a comment, though ;) Regards, Yann.
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
Hi Yann! 2016-07-01 18:02 GMT+02:00 Yann Ylavic: > On Fri, Jul 1, 2016 at 5:00 PM, wrote: > > Author: elukey > > Date: Fri Jul 1 15:00:42 2016 > > New Revision: 1750953 > > > > URL: http://svn.apache.org/viewvc?rev=1750953=rev > > Log: > > Fixed typo in log message, wrong RFC mentioned. > > > > Modified: > > httpd/httpd/trunk/server/util_script.c > > > > Modified: httpd/httpd/trunk/server/util_script.c > > URL: > http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util_script.c?rev=1750953=1750952=1750953=diff > > > == > > --- httpd/httpd/trunk/server/util_script.c (original) > > +++ httpd/httpd/trunk/server/util_script.c Fri Jul 1 15:00:42 2016 > > @@ -679,7 +679,7 @@ AP_DECLARE(int) ap_scan_script_header_er > > apr_rfc822_date(last_modified_datestr, > last_modified_date); > > ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, > r, > >"The Last-Modified header value > '%s' " > > - "(parsed as RFC882/RFC1123 > datetime, " > > + "(parsed as RFC822/RFC1123 > datetime, " > >"that assumes the GMT timezone) " > >"has been replaced with: '%s'. " > >"An origin server with a clock > must not send " > > This message is possibly too long, you shouldn't cite the RFC, IMHO. > Something like "The Last-Modified header value '%s' (in the future) > has been replaced with '%s'" is probably enough. > Thanks a lot for the review. This error message is the result of a long email thread in users@ to make very clear that a Last-Modified value sent from FCGI/CGI is assumed to be in GMT because of what stated in RFC822/RFC1123. Maybe something like the following could work better? "The Last-Modified header value '%s' (parsed assuming the GMT timezone) has been replaced with '%s' because considered in the future." Regards, Luca
Re: svn commit: r1750953 - /httpd/httpd/trunk/server/util_script.c
On Fri, Jul 1, 2016 at 5:00 PM,wrote: > Author: elukey > Date: Fri Jul 1 15:00:42 2016 > New Revision: 1750953 > > URL: http://svn.apache.org/viewvc?rev=1750953=rev > Log: > Fixed typo in log message, wrong RFC mentioned. > > Modified: > httpd/httpd/trunk/server/util_script.c > > Modified: httpd/httpd/trunk/server/util_script.c > URL: > http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util_script.c?rev=1750953=1750952=1750953=diff > == > --- httpd/httpd/trunk/server/util_script.c (original) > +++ httpd/httpd/trunk/server/util_script.c Fri Jul 1 15:00:42 2016 > @@ -679,7 +679,7 @@ AP_DECLARE(int) ap_scan_script_header_er > apr_rfc822_date(last_modified_datestr, > last_modified_date); > ap_log_rerror(SCRIPT_LOG_MARK, APLOG_TRACE1, 0, r, >"The Last-Modified header value '%s' " > - "(parsed as RFC882/RFC1123 datetime, " > + "(parsed as RFC822/RFC1123 datetime, " >"that assumes the GMT timezone) " >"has been replaced with: '%s'. " >"An origin server with a clock must > not send " This message is possibly too long, you shouldn't cite the RFC, IMHO. Something like "The Last-Modified header value '%s' (in the future) has been replaced with '%s'" is probably enough. Regards, Yann. > >