Millan wrote: > ... > If I intend to, for example, compare two UTF-8 encoded string which were > converted to wchar_t strings with "mbstowcs()", and if those two strings are > in Serbian - cyrillic unicode space (one alphabet), does it matter if I set > locale to "en_US.UTF-8", "de_DE.UTF-8" or "sr_CS.UTF-8" (which is Serbian > locale) since all three are UTF-8 locales or it does not since wchar_t > strings will probably be UTF-32 encoded (sizeof(wchar_t) == 4) or UTF-16 > (sizeof(wchar_t) == 2)? If doesn't, in that case, once I convert UTF-8 > strings to wchar_t strings (UTF-32 or UTF-16 encoded strings) what locale I > need to set, because if I set "en_US" locale it will use "ISO-8859-1" > charset, and then again if I set to "en_US.UTF-8" locale, then it will use > UTF-8 encoding while strings will be UTF-32 or UTF-16 encoded??
OK, first I would recommend going to unicode.org and doing some background reading on Unicode. Don't concern yourself specifically with UTF-8, because that is just one possible encoding of Unicode. Unicode has four possible "normalization" forms, abbreviated NFC (Composed), NFKC (Composed but using the ISO-8859-1 compose characters in preference to any newer character codes), NFD (decomposed), and NFKD (decomposed except for ISO-8859-1 characters). "Composed" means that a single character value is used, for example an A with ^ on the top (Á in HTML) would be Unicode character 00C1. "Decomposed" means that multiple values are used to compose the final character. For the previous example, the characters 0301 (^) and 0041 (A) would be used instead. So the issue is not UTF-8 vs. UTF-16 vs. UTF-32, but whether your locale defines the necessary case mapping tables for the languages you use and which normalization form you use for your text. -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Internet Printing and Document Software http://www.easysw.com _______________________________________________ fltk mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk

