After looking at the URI RFCs (twilight zone), here is a basic algorithm which should be applied to the request:
1: Unreserved chars should always be unescaped (if they are escaped), producing a normalized request uri (assuming no #3 illegal chars, this is what will go into the log file): unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" 2: reserved chars should be left as-is: reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" 3: Any other char should be considered an exploit or client error. If the url is logged, it might end with the % encoded representation of the first illegal char, plus a response code to identify the client error. Since this decoding/normalization step should take place when processing the start-line/request-line, this might be a good time to update AOLserver to handle absolute uri, which will also require changes to the logic used for (virtual) host identification. The RFC support for this is from 3986: 2.1. Percent-Encoding A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value. For example, "%20" is the percent-encoding for the binary octet "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space character (SP). Section 2.4 describes when percent-encoding and decoding is applied. pct-encoded = "%" HEXDIG HEXDIG The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings. Anyway, it is critical to examine and normalize the request uri asap and act quickly when presented with invalid chars. tom jackson On Sat, Sep 11, 2010 at 4:00 PM, Tom Jackson <t...@rmadilo.com> wrote: > This is not an AOLserver issue to write a log file that is safe for > broken programs. If there are illegal chars in the URL, maybe reject > the request outright. If the chars are legal, then there isn't much > else to be done. > > Chances are the chars in question should be escaped in the URL, so the > request should be rejected. Although it might be nice to inform the > client, it might be okay and more safe to just drop the request with > no response. > > tom jackson > > > On Fri, Sep 10, 2010 at 9:34 AM, Dossy Shiobara <do...@panoptic.com> wrote: >> Fair enough. ;-) >> >> On 9/10/10 2:07 AM, Gustaf Neumann wrote: >>> The information loss (changing ESC to the bell character 7) is very >>> little; >>> under normal operation, you should never have a bell character in the >>> log file, and now, if you see one, it should ring a bell.... >> >> -- >> Dossy Shiobara | do...@panoptic.com | http://dossy.org/ >> Panoptic Computer Network | http://panoptic.com/ >> "He realized the fastest way to change is to laugh at your own >> folly -- then you can let go and quickly move on." (p. 70) >> >> >> -- >> AOLserver - http://www.aolserver.com/ >> >> To Remove yourself from this list, simply send an email to >> <lists...@listserv.aol.com> with the >> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: >> field of your email blank. >> > -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.