So it seems pretty absurd we are coming back to this over
three years later, but is there any reason to preserve pre-RFC 2068
behaviors? I appreciate that Stefan was trying to avoid harming
existing deployment scenarios, but even as I'm about to propose
that we backport all of this to 2.4 and 2.2, I have several questions;

1. offer a logging-only option? Why? It seems like a simple
   choice, follow the spec, or don't. If you want to see what's
   going on, Wireshark, Fiddler and dozens of other tools let
   you inspect the conversation.

2. leave the default as 'not-strict'? Seems we should most
   strongly recommend that the server observe RFC's 2068,
   2616 and 723x, and not tolerate ancient behavior by default
   unless the admin insists on being foolish.

3. retain these legacy faulty behaviors in httpd 2.next/3.0?
   Seems that once we agree on a backport, the ancient
   side of this logic should all just disappear from trunk.

4. detail the error to the error log? Again, there are inspection
   tools, but more importantly, no visual user-agent is going
   to send this garbage, and automated requests are going
   to discard the 400 response. Seems we can save a lot of
   code throwing away the details that just don't help, and
   are generally the product of abusive traffic.

Thoughts?


-----------------

http://svn.apache.org/viewvc?view=revision&revision=1426877
Author: sf
Date: Sun Dec 30 01:23:24 2012 UTC (3 years, 7 months ago)
Changed paths: 9
Log Message:
Add an option to enforce stricter HTTP conformance

This is a first stab, the checks will likely have to be revised.
For now, we check

 * if the request line contains control characters
 * if the request uri has fragment or username/password
 * that the request method is standard or registered with RegisterHttpMethod
 * that the request protocol is of the form HTTP/[1-9]+.[0-9]+,
   or missing for 0.9
 * if there is garbage in the request line after the protocol
 * if any request header contains control characters
 * if any request header has an empty name
 * for the host name in the URL or Host header:
   - if an IPv4 dotted decimal address: Reject octal or hex values, require
     exactly four parts
   - if a DNS host name: Reject non-alphanumeric characters besides '.' and
     '-'. As a side effect, this rejects multiple Host headers.
 * if any response header contains control characters
 * if any response header has an empty name
 * that the Location response header (if present) has a valid scheme and is
   absolute

If we have a host name both from the URL and the Host header, we replace the
Host header with the value from the URL to enforce RFC conformance.

There is a log-only mode, but the loglevels of the logged messages need some
thought/work. Currently, the  checks for incoming data log for 'core' and
the
checks for outgoing data log for 'http'. Maybe we need a way to configure
the
loglevels separately from the core/http loglevels.

Reply via email to