I am not a very good teacher, but let me try anyways ... >There's just been (yet another, sigh) round of "we want more support for >UTF-8 than just what the utf8 package gives us" messages in the lua-l >mailing list.
I too wish that could happen, but the bottleneck here isn't UTF-8 itself, it is Microsoft. >The bottom line from that discussion seems to be that full Unicode >support (not just UTF-8) requires all sorts of extras, e.g. combining >characters, layout direction (L-to-R or R-to-L?) directives, glyphs >for typesetting, etc. Why not have partial support where there are no layout directions, glyphs, etc? Only a small minority of languages require all those things anyways. With combining diacritics there are many precomposed chars, so a lack of combining diacritic support isn't a very good excuse to not at least partially support UTF-8. With just a plain Jane Chinese, Korean, Japanese, English, Hindi, etc language support, you already have more than 95% of the world's languages. >I'm curious to know more about the underlying APIs and/or rationale >behind the "Microsoft only fully supports UTF-16" statement. Because that is what Microsoft has said they will only support, although Microsoft lately has said they might be coming up with some UTF-8 support for Win10. Unfortunately it is a day late and a dollar short because it isn't retroactive. >The UTF-8 encoding has the nice property ... You are preaching to the choir. Now try and convince Microsoft to abandon UTF-16 support and switch over to UTF-8 instead like everyone else knows they should. >So, could you please help explain to a non-MS-OS person like myself >about how UTF-8 is a poor cousin to UTF-16 in Microsoft operating >systems? That is not the right interpretation. It isn't how Unicode works that is the issue, it is how it can be supported. UTF-8 is not a poor cousin to UTF-16, it is a complete stranger because it is unsupported by Microsoft. When it comes to language code support in the Windows API, there are only two kinds of functions: the "A" functions and the "W" functions, where "A" stands for 8-bit ANSI and "W" stands for WideChar alias UTF-16. Ditto for string types, which only come in ANSI or UTF-16, and anything else will not be recognized by Windows. Therefore the real problems with UTF-8 support is that a UTF-16 monopolized infrastructure has been built up around Microsoft products and prevents UTF-8 support from ever being easy or possible. Take the standardized C/C++ CRTs for example. If you want to get the length of a string, strlen() will only work with ANSI. There is an alternative to strlen() for UTF-16 and it is called wcslen() and ... you guessed it ... it will only work with UTF-16 strings. The Windows console does not print or display Unicode chars. The filesystem Windows uses only supports UTF-16 (it is internally converted by Windows from UTF-16 to ANSI for non-Unicode Windows applications). The list goes on and on so if you want to compile your code so it is Unicode compatible for Windows, your code won't be cross-compilable for Linux or MAC because they don't support UTF-16, they support UTF-8. Are you beginning to get the picture here? Regards, Andrew ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Iup-users mailing list Iup-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/iup-users