Millan wrote:
> ...
> If I intend to, for example, compare two UTF-8 encoded string which were 
> converted to wchar_t strings with "mbstowcs()", and if those two strings are 
> in Serbian - cyrillic unicode space (one alphabet), does it matter if I set 
> locale to "en_US.UTF-8", "de_DE.UTF-8" or "sr_CS.UTF-8" (which is Serbian 
> locale) since all three are UTF-8 locales or it does not since wchar_t 
> strings will probably be UTF-32 encoded (sizeof(wchar_t) == 4) or UTF-16 
> (sizeof(wchar_t) == 2)? If doesn't, in that case, once I convert UTF-8 
> strings to wchar_t strings (UTF-32 or UTF-16 encoded strings) what locale I 
> need to set, because if I set "en_US" locale it will use "ISO-8859-1" 
> charset, and then again if I set to "en_US.UTF-8" locale, then it will use 
> UTF-8 encoding while strings will be UTF-32 or UTF-16 encoded??

OK, first I would recommend going to unicode.org and doing some
background reading on Unicode.  Don't concern yourself specifically
with UTF-8, because that is just one possible encoding of Unicode.

Unicode has four possible "normalization" forms, abbreviated NFC
(Composed), NFKC (Composed but using the ISO-8859-1 compose characters
in preference to any newer character codes), NFD (decomposed), and
NFKD (decomposed except for ISO-8859-1 characters).

"Composed" means that a single character value is used, for example
an A with ^ on the top (Á in HTML) would be Unicode character
00C1.

"Decomposed" means that multiple values are used to compose the final
character.  For the previous example, the characters 0301 (^) and
0041 (A) would be used instead.

So the issue is not UTF-8 vs. UTF-16 vs. UTF-32, but whether your
locale defines the necessary case mapping tables for the languages
you use and which normalization form you use for your text.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com
_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Reply via email to