> GHC 6.2 (shortly to be released) also supports toUpper, toLower, and the > character predicates isUpper, isLower etc. on the full Unicode character > set. > > There is one caveat: the implementation is based on the C library's > towupper() and so on, so the support is only as good as the C library > provides, and it relies on wchar_t being equivalent to Unicode (the > sensible choice, but not all libcs do this).
Now, why would one want to base this on C's wchar_t and its "w" routines? wchar_t is sometimes (isolated) UTF-32 code units, including in Linux, sometimes it is (isolated) UTF-16 code units, including in Windows, and sometimes something utterly useless. The casing data is not reliable (it could be entirely wrong, and even locale dependent in an erroneous way), nor kept up to date with the Unicode character database in all implementations (even where wchar_t is some form of Unicode/10646). wchar_t is best forgotten, especially for portable programs. Please instead use ICU's UChar32, which is (isolated) UTF-32, and and Unicode::isUpperCase(cp), Unicode::toUpperCase(cp) (C++ here), etc. The ICU data is kept up-to-date with Unicode versions. The case mappings are the simplistic ones, not taking SpecialCasing.txt into account, just the UnicodeData.txt case mapping data. It is thus not locale dependent, nor context dependent, nor doesn't cae-map a character to more than one character (so it is not fully appropriate for strings, but still much, much better than C's wchar_t and its w-functions). > Proper support for character set conversions in the I/O library has been > talked about for some time, and there are a couple of implementations One can base this on the ICU character encoding conversions. I would very much recommend that over the C locale dependent "mb" conversion routines, for the same reasons as above. /kent k _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell