Followup to: <[EMAIL PROTECTED]> By author: Paul Michel <[EMAIL PROTECTED]> In newsgroup: linux.utf8 > > After reading a past discussion related to utf-8 > support in glibc 2.2, I was not sure of the conclusion > regarding strcoll. > I understood that all char functions work on bytes. > None of them handle utf-8 in the sense that all these > functions do not recognise any utf-8 encoded > character, but only bytes. Now depending on what kind > of processing they actually do, they can correctly > handle utf-8 data (e.g. strcpy). > > IMHO, strcoll cannot correctly handle utf-8 encoded > characters since collation need explicit knowledge of > characters. For instance, collation rules for Finnish > are particular regarding some letters that are encoded > on more than one byte in utf-8(e.g. ö, xC3B6 in > utf-8). >
Since strcoll() assigns meanings to strings, it would obviously need to decode the UTF-8 characters; except, of course, in the "C" locale (where sorting is defined to be in binary order) since UTF-8 binary order is identical to Unicode binary order (fortunately... it would be very confusing to know what the "C" locale should do, otherwise.) -hpa -- <[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <[EMAIL PROTECTED]> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/