RE: apr_token_* conclusions (was: Better casecmpstr[n]?)

Bert Huijben Wed, 25 Nov 2015 15:29:12 -0800

See http://www.siao2.com/2004/12/03/274288.aspx


And http://www.siao2.com/2013/04/04/10407543.aspx

For some background and related bugs in several products.

 

I hope this blog will stay alive. (The author passed away recently)

 

                Bert

 

From: Bert Huijben [mailto:[email protected]] 
Sent: donderdag 26 november 2015 00:22
To: [email protected]
Subject: RE: apr_token_* conclusions (was: Better casecmpstr[n]?)

 

The example was the other way around. Changing SS to ß is not a valid 
transform, but the other way is. There are also transforms on the combined AE 
characters, etc.

 

That Turkish ‘I’ problem is the only case I know of where the collation 
actually changes behavior within the usual western alphabet of ASCII characters.

 

                Bert

 

 

From: Mikhail T. [mailto:[email protected]] 
Sent: woensdag 25 november 2015 23:19
To: [email protected] <mailto:[email protected]> 
Subject: Re: apr_token_* conclusions (was: Better casecmpstr[n]?)

 

On 25.11.2015 14:10, Mikhail T. wrote:

Two variables, LC_CTYPE and LC_COLLATE control this text processing behavior.  
The above is the correct lower case transliteration for Turkish.  In German, 
the upper case correspondence of sharp-S ß is 'SS', but multi-char translation 
is not provided by the simple tolower/toupper functions.

So, the concern is, some hypothetical header, such as X-ASSIGN-TO may, after 
going through the locale-aware strtolower() unexpectedly become x-aßign-to?

I just tested the above on both FreeBSD and Linux, and the results are 
encouraging:

% echo STRASSE | env LANG=de_DE.ISO8859 tr '[[:upper:]]' '[[:lower:]]'
strasse

Thus, I contend, using C-library will not cause invalid results, and the only 
reason to have Apache's own implementation is performance, but not correctness.

-mi

RE: apr_token_* conclusions (was: Better casecmpstr[n]?)

Reply via email to