So it seems pretty absurd we are coming back to this over three years later, but is there any reason to preserve pre-RFC 2068 behaviors? I appreciate that Stefan was trying to avoid harming existing deployment scenarios, but even as I'm about to propose that we backport all of this to 2.4 and 2.2, I have several questions;
1. offer a logging-only option? Why? It seems like a simple choice, follow the spec, or don't. If you want to see what's going on, Wireshark, Fiddler and dozens of other tools let you inspect the conversation. 2. leave the default as 'not-strict'? Seems we should most strongly recommend that the server observe RFC's 2068, 2616 and 723x, and not tolerate ancient behavior by default unless the admin insists on being foolish. 3. retain these legacy faulty behaviors in httpd 2.next/3.0? Seems that once we agree on a backport, the ancient side of this logic should all just disappear from trunk. 4. detail the error to the error log? Again, there are inspection tools, but more importantly, no visual user-agent is going to send this garbage, and automated requests are going to discard the 400 response. Seems we can save a lot of code throwing away the details that just don't help, and are generally the product of abusive traffic. Thoughts? ----------------- http://svn.apache.org/viewvc?view=revision&revision=1426877 Author: sf Date: Sun Dec 30 01:23:24 2012 UTC (3 years, 7 months ago) Changed paths: 9 Log Message: Add an option to enforce stricter HTTP conformance This is a first stab, the checks will likely have to be revised. For now, we check * if the request line contains control characters * if the request uri has fragment or username/password * that the request method is standard or registered with RegisterHttpMethod * that the request protocol is of the form HTTP/[1-9]+.[0-9]+, or missing for 0.9 * if there is garbage in the request line after the protocol * if any request header contains control characters * if any request header has an empty name * for the host name in the URL or Host header: - if an IPv4 dotted decimal address: Reject octal or hex values, require exactly four parts - if a DNS host name: Reject non-alphanumeric characters besides '.' and '-'. As a side effect, this rejects multiple Host headers. * if any response header contains control characters * if any response header has an empty name * that the Location response header (if present) has a valid scheme and is absolute If we have a host name both from the URL and the Host header, we replace the Host header with the value from the URL to enforce RFC conformance. There is a log-only mode, but the loglevels of the logged messages need some thought/work. Currently, the checks for incoming data log for 'core' and the checks for outgoing data log for 'http'. Maybe we need a way to configure the loglevels separately from the core/http loglevels.