On Nov 25, 2015 12:00, "Mikhail T." <mi+t...@aldan.algebra.com> wrote: > > On 25.11.2015 12:42, William A Rowe Jr wrote: >> >> If the script switches setlocale to turkish, for example, our forced-lowercase content-type conversion >> will cause "IMAGE/GIF" to become "ımage/gıf", clearly not what the specs intended. > > I'm sorry, could you elaborate on this? Would not strtolower(3) convert "IMAGE/GIF" to "image/gif" in all locales -- including "C"? At least, in all single-byte charsets -- such as the Turkish ISO 8859-9? Yes, the function will act differently on the strings containing octets above 127, but those would occur neither in content-types nor in header-names...
Two variables, LC_CTYPE and LC_COLLATE control this text processing behavior. The above is the correct lower case transliteration for Turkish. In German, the upper case correspondence of sharp-S ß is 'SS', but multi-char translation is not provided by the simple tolower/toupper functions. Consider this is a function of language, and not of 'charset' per-say. The same charset behaves differently based on the locale's language. >> Adding unambiguous token handling functions would be good for the few case-insensitive string comparison, string folding, and search functions. It allows the spec-consumer to trust their string processing. > > Up until now, I thought, the thread was about coming up with a short-cut -- an optimization for processing tokens, like request-headers, which are known to be in US-ASCII anyway and where using locale-aware functions is simply wasteful -- but not incorrect. Partially so, that was the motivation behind the proposal. Apparently OS/X in particular has a slow implementation of strcasecmp even running under the Posix locale. > You seem to imply, the locale-aware functions might be doing the wrong thing some times -- and this confuses me... Until the APR consumer, including an instance of httpd, actually calls setlocale(), everything should be behaving as expected. If your in-process code under httpd calls setlocale() to customize its behavior based on the HTTP consumer's locale, that is when things may go badly under the hood in both httpd and in APR. But yes, I flagged this to the security team almost immediately and then had to research what could introduce such a vulnerability of accepting unexpected input and treating it as valid ASCII. I was less concerned with treating valid ASCII as opaque text which would be rejected.