Re: apr_token_* conclusions (was: Better casecmpstr[n]?)

William A Rowe Jr Wed, 25 Nov 2015 10:17:01 -0800

On Nov 25, 2015 12:00, "Mikhail T." <mi+t...@aldan.algebra.com> wrote:
>
> On 25.11.2015 12:42, William A Rowe Jr wrote:
>>
>> If the script switches setlocale to turkish, for example, our
forced-lowercase content-type conversion
>> will cause "IMAGE/GIF" to become "ımage/gıf", clearly not what the specs
intended.
>
> I'm sorry, could you elaborate on this? Would not strtolower(3) convert
"IMAGE/GIF" to "image/gif" in all locales -- including "C"? At least, in
all single-byte charsets -- such as the Turkish ISO 8859-9? Yes, the
function will act differently on the strings containing octets above 127,
but those would occur neither in content-types nor in header-names...


Two variables, LC_CTYPE and LC_COLLATE control this text processing
behavior.  The above is the correct lower case transliteration for
Turkish.  In German, the upper case correspondence of sharp-S ß is 'SS',
but multi-char translation is not provided by the simple tolower/toupper
functions.

Consider this is a function of language, and not of 'charset' per-say.  The
same charset behaves differently based on the locale's language.

>> Adding unambiguous token handling functions would be good for the few
case-insensitive string comparison, string folding, and search functions.
It allows the spec-consumer to trust their string processing.
>
> Up until now, I thought, the thread was about coming up with a short-cut
-- an optimization for processing tokens, like request-headers, which are
known to be in US-ASCII anyway and where using locale-aware functions is
simply wasteful -- but not incorrect.

Partially so, that was the motivation behind the proposal.  Apparently OS/X
in particular has a slow implementation of strcasecmp even running under
the Posix locale.

> You seem to imply, the locale-aware functions might be doing the wrong
thing some times -- and this confuses me...

Until the APR consumer, including an instance of httpd, actually calls
setlocale(), everything should be behaving as expected.  If your in-process
code under httpd calls setlocale() to customize its behavior based on the
HTTP consumer's locale, that is when things may go badly under the hood in
both httpd and in APR.

But yes, I flagged this to the security team almost immediately and then
had to research what could introduce such a vulnerability of accepting
unexpected input and treating it as valid ASCII.  I was less concerned with
treating valid ASCII as opaque text which would be rejected.

Re: apr_token_* conclusions (was: Better casecmpstr[n]?)

Reply via email to