Hi Stefan, Thanks for this work, but I don't consider HTTP conformance to be an option. These are checks we should be making while parsing the received message, not as a separate pass, and in many cases they are required to result in a 400, 500, or 502 response.
I am trying to get HTTPbis ready for last call this week. After that, I will be looking into making the changes in httpd, and I won't be using a configurable option. I suggest we just remove that part and iterate on these checks as we go. The current HTTP/1.1 drafts are at http://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/ and the message parsing requirements are in p1-messaging.html BTW, the protocol version is now restricted to uppercase and one digit per major and minor, so we can simplify that check to a very specific "HTTP/[0-9].[0-9]". And, yes, it is possible to have a valid empty Host because HTTP can be used with a proxy for any URI (including URNs). ....Roy On Dec 29, 2012, at 5:23 PM, s...@apache.org wrote: > Author: sf > Date: Sun Dec 30 01:23:24 2012 > New Revision: 1426877 > > URL: http://svn.apache.org/viewvc?rev=1426877&view=rev > Log: > Add an option to enforce stricter HTTP conformance > > This is a first stab, the checks will likely have to be revised. > For now, we check > > * if the request line contains control characters > * if the request uri has fragment or username/password > * that the request method is standard or registered with RegisterHttpMethod > * that the request protocol is of the form HTTP/[1-9]+.[0-9]+, > or missing for 0.9 > * if there is garbage in the request line after the protocol > * if any request header contains control characters > * if any request header has an empty name > * for the host name in the URL or Host header: > - if an IPv4 dotted decimal address: Reject octal or hex values, require > exactly four parts > - if a DNS host name: Reject non-alphanumeric characters besides '.' and > '-'. As a side effect, this rejects multiple Host headers. > * if any response header contains control characters > * if any response header has an empty name > * that the Location response header (if present) has a valid scheme and is > absolute > > If we have a host name both from the URL and the Host header, we replace the > Host header with the value from the URL to enforce RFC conformance. > > There is a log-only mode, but the loglevels of the logged messages need some > thought/work. Currently, the checks for incoming data log for 'core' and the > checks for outgoing data log for 'http'. Maybe we need a way to configure the > loglevels separately from the core/http loglevels.