I am not a very good teacher, but let me try anyways ...

>There's just been (yet another, sigh) round of "we want more support for
>UTF-8 than just what the utf8 package gives us" messages in the lua-l
>mailing list.

I too wish that could happen, but the bottleneck here isn't UTF-8 itself, it
is Microsoft.

>The bottom line from that discussion seems to be that full Unicode
>support (not just UTF-8) requires all sorts of extras, e.g. combining
>characters, layout direction (L-to-R or R-to-L?) directives, glyphs
>for typesetting, etc.

Why not have partial support where there are no layout directions, glyphs,
etc? Only a small minority of languages require all those things anyways. With
combining diacritics there are many precomposed chars, so a lack of combining
diacritic support isn't a very good excuse to not at least partially support
UTF-8. With just a plain Jane Chinese, Korean, Japanese, English, Hindi, etc
language support, you already have more than 95% of the world's languages.

>I'm curious to know more about the underlying APIs and/or rationale
>behind the "Microsoft only fully supports UTF-16" statement.

Because that is what Microsoft has said they will only support, although
Microsoft lately has said they might be coming up with some UTF-8 support for
Win10. Unfortunately it is a day late and a dollar short because it isn't
retroactive.

>The UTF-8 encoding has the nice property ...

You are preaching to the choir. Now try and convince Microsoft to abandon
UTF-16 support and switch over to UTF-8 instead like everyone else knows they
should.

>So, could you please help explain to a non-MS-OS person like myself
>about how UTF-8 is a poor cousin to UTF-16 in Microsoft operating
>systems?

That is not the right interpretation. It isn't how Unicode works that is the
issue, it is how it can be supported. UTF-8 is not a poor cousin to UTF-16, it
is a complete stranger because it is unsupported by Microsoft. When it comes
to language code support in the Windows API, there are only two kinds of
functions: the "A" functions and the "W" functions, where "A" stands for 8-bit
ANSI and "W" stands for WideChar alias UTF-16. Ditto for string types, which
only come in ANSI or UTF-16, and anything else will not be recognized by
Windows.

Therefore the real problems with UTF-8 support is that a UTF-16 monopolized
infrastructure has been built up around Microsoft products and prevents UTF-8
support from ever being easy or possible. Take the standardized C/C++ CRTs for
example. If you want to get the length of a string, strlen() will only work
with ANSI. There is an alternative to strlen() for UTF-16 and it is called
wcslen() and ... you guessed it ... it will only work with UTF-16 strings. The
Windows console does not print or display Unicode chars. The filesystem
Windows uses only supports UTF-16 (it is internally converted by Windows from
UTF-16 to ANSI for non-Unicode Windows applications). The list goes on and on
so if you want to compile your code so it is Unicode compatible for Windows,
your code won't be cross-compilable for Linux or MAC because they don't
support UTF-16, they support UTF-8.

Are you beginning to get the picture here?

Regards,
Andrew


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Iup-users mailing list
Iup-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iup-users

Reply via email to