After looking at the URI RFCs (twilight zone), here is a basic
algorithm which should be applied to the request:

1: Unreserved chars should always be unescaped (if they are escaped),
producing a normalized request uri (assuming no #3 illegal chars, this
is what will go into the log file):

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

2: reserved chars should be left as-is:

reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

3: Any other char should be considered an exploit or client error. If
the url is logged, it might end with the % encoded representation of
the first illegal char, plus a response code to identify the client
error.

Since this decoding/normalization step should take place when
processing the start-line/request-line, this might be a good time to
update AOLserver to handle absolute uri, which will also require
changes to the logic used for (virtual) host identification.

The RFC support for this is from 3986:

2.1.  Percent-Encoding

   A percent-encoding mechanism is used to represent a data octet in a
   component when that octet's corresponding character is outside the
   allowed set or is being used as a delimiter of, or within, the
   component.  A percent-encoded octet is encoded as a character
   triplet, consisting of the percent character "%" followed by the two
   hexadecimal digits representing that octet's numeric value.  For
   example, "%20" is the percent-encoding for the binary octet
   "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
   character (SP).  Section 2.4 describes when percent-encoding and
   decoding is applied.

      pct-encoded = "%" HEXDIG HEXDIG

   The uppercase hexadecimal digits 'A' through 'F' are equivalent to
   the lowercase digits 'a' through 'f', respectively.  If two URIs
   differ only in the case of hexadecimal digits used in percent-encoded
   octets, they are equivalent.  For consistency, URI producers and
   normalizers should use uppercase hexadecimal digits for all percent-
   encodings.

Anyway, it is critical to examine and normalize the request uri asap
and act quickly when presented with invalid chars.

tom jackson


On Sat, Sep 11, 2010 at 4:00 PM, Tom Jackson <t...@rmadilo.com> wrote:
> This is not an AOLserver issue to write a log file that is safe for
> broken programs. If there are illegal chars in the URL, maybe reject
> the request outright. If the chars are legal, then there isn't much
> else to be done.
>
> Chances are the chars in question should be escaped in the URL, so the
> request should be rejected. Although it might be nice to inform the
> client, it might be okay and more safe to just drop the request with
> no response.
>
> tom jackson
>
>
> On Fri, Sep 10, 2010 at 9:34 AM, Dossy Shiobara <do...@panoptic.com> wrote:
>>  Fair enough.  ;-)
>>
>> On 9/10/10 2:07 AM, Gustaf Neumann wrote:
>>> The information loss (changing ESC to the bell character 7) is very
>>> little;
>>> under normal operation, you should never have a bell character in the
>>> log file, and now, if you see one, it should ring a bell....
>>
>> --
>> Dossy Shiobara              | do...@panoptic.com | http://dossy.org/
>> Panoptic Computer Network   | http://panoptic.com/
>>  "He realized the fastest way to change is to laugh at your own
>>    folly -- then you can let go and quickly move on." (p. 70)
>>
>>
>> --
>> AOLserver - http://www.aolserver.com/
>>
>> To Remove yourself from this list, simply send an email to 
>> <lists...@listserv.aol.com> with the
>> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
>> field of your email blank.
>>
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to