strcoll for utf-8

Paul Michel Mon, 07 Jan 2002 11:26:13 -0800


After reading a past discussion related to utf-8
support in glibc 2.2, I was not sure of the conclusion
regarding strcoll.
I understood that all char functions work on bytes.
None of them handle utf-8 in the sense that all these
functions do not recognise any utf-8 encoded
character, but only bytes. Now depending on what kind
of processing they actually do, they can correctly
handle utf-8 data (e.g. strcpy).


IMHO, strcoll cannot correctly handle utf-8 encoded
characters since collation need explicit knowledge of
characters. For instance, collation rules for Finnish
are particular regarding some letters that are encoded
on more than one byte in utf-8(e.g. ö, xC3B6 in
utf-8).

Paul

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

strcoll for utf-8

Reply via email to