On Wed, Nov 08, 2000 at 12:32:49PM +0100, Karlsson Kent - keka wrote:
> 
> 
> > -----Original Message-----
> > From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, November 08, 2000 11:58 AM
> > To: Linux i18n
> > Subject: Sorting and combining diacritical marks
> > 
> > 
> > 
> > A quick look at ISO 14651 tables
> > http://www.iso.ch/ittf/ISO14651_2000_TABLE1.htm, yields that there's a
> > U+0308 (COMBINING DIAERESIS) is in the table. But it is 
> > excluded in the
> > "iso14651_t1" file in glibc. Does it mean that the programmer should
> > normalize the strings before the sorting, or the umlaut will 
> > be ignored?
> 
> I just took a quick look at iso14651_t1.  It seems to be extremely
> old, and very limited.  Not at all like the "2000" table, which
> 1) covers all of Unicode 2.1, and 2) does handle combining diacritics.
> I guess Keld might know the details as to why the "t1" table has
> not been updated.

Ulrich has taken the table from an older draft, where
uppercase and lowercase had been separated. This facilitates
that regular expressions like [a-c]* does not address 
initial uppercase letter, which could cause surprises for
users eg deleting files and not expecting filse with initial
uppercase letters to be deleted.

Keld

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to