Re: i18n codepage guidance needed

Branko Čibej Wed, 13 Apr 2011 12:20:51 -0700

On 12.04.2011 21:37, William A. Rowe Jr. wrote:
> On 4/12/2011 11:56 AM, Jeff Trawick wrote:
>> On Tue, Apr 12, 2011 at 12:29 PM, William A. Rowe Jr.
>> <wr...@rowe-clan.net> wrote:
>>> I have one dev question for my apr_fnmatch() refactoring
>>>
>>> Today we lowercase the two characters (and don't support case-insensitive
>>> range matches at all, I won't change this apr-specific quirk).  But IIRC
>>> there are language with multiple lower case representations of the same
>>> upper case character, but never (or at least, rarely) visa versa?
>>>
>>> Shouldn't we upcase both the text and match chars, instead, to better
>>> support non-ASCII locales?  (Obviously, this ignores utf-8 issues, and
>>> I'm not going to enable MBCS in this next release, but will at least make
>>> it possible to enhance for MBCS later on, without changing fn prototypes).
>> No real answer, just some comments...
>>
>> * FWLIW, it is tolower() now "just because."  It was originally toupper().
>> * For interesting text, it could change behavior, and we don't have
>> bugs filed now, right?
>> * For interesting text, neither toupper() nor tolower() nor == is
>> correct!  (So don't bother changing behavior.)
> I think I found the answer to "just because", thanks Deutchlanders... from
> the linux manpage...
>
>   In some non-English locales, there are lowercase letters with no corre-
>   sponding uppercase equivalent; the German sharp s is one example.
>
> Still pondering.


The only marginally safe comparison would be strcoll on whole
non-wildcard subsequences of the pattern, and even that isn't guaranteed
to work because the filesystem (it's for fnmatch, right?) can have a
different collation than the current locale, thank you NTFS.

-- Brane

Re: i18n codepage guidance needed

Reply via email to