Re: [Iup-users] Use utf-8 source encoding rather than ISO8859-1

Andrew Robinson Tue, 17 Jul 2018 20:20:48 -0700

Most editors  and compilers will work with 8-bit ANSI, 7-bit ASCII, and
multi-byte UTF-8 but that is irrelevant. What matters is how will the OS
interpret anything you throw at it. The only two choices you have under
Windows for encoding are ANSI or UTF-16 ("widechar"). There is no way around
that and that is why I say ANSI is the lowest common denominator because
Windows doesn't do ASCII (in fact, ASCII would corrupt Windows and ASCII was
written strictly for Western English usage). You think you are only changing a
string and when suddenly you will find yourself with built-in strings that
won't be parsed correctly by Windows functions. You will have strings that
won't sort and they won't display properly, and so on and so forth.


Let's try and come up with a few different solutions for your problem, which I
am sure isn't that the IUP source code won't compile if you don't change the
default encoding.


On 2018-07-17 at 7:07 PM, 云风 Cloud Wu <[email protected]> wrote:


Andrew Robinson <[email protected]>于2018年7月18日周三 上午6:03写道：

Cloud Wu,

Absolutely not. In order for IUP to work with all OSes, it cannot use UTF-8
because Microsoft only fully supports UTF-16. The lowest common denominator
are the ANSI Latin code pages.





I'm sorry that I didn't make myself clear.


My opinion is using ASCII only in source code rather than ANSI latin code page
( code page 1252) . The utf-8 text only exist in comments that would be
ignored by almost compilers.


The issue of cp1252 is that many compilers would recognize the input source
depend on os current locale , so when system locale is not cp1252, it raises a
compile error. msvc has this behavior [1] , and some version of gcc will also
recognize this Latin char as UTF-8 char by default (at least in my
environment).


Although we can use some command line options or pragma to specify the source
character set, but it's different between the compilers, So I think the lowest
common denominator are ASCII (code point 32-127) rather than ANSI Latin code
pages.


My patch in original post is only change these Latin characters in string
literal to number (for example, from  mm('£')= 80 to mm(163)=80 ) or using
escape character  like "\xB5" instead of "µ" . I convert some Latin character
in comments to utf-8 because it would be more friendly to modern editors ,and 
it will not affect the compiler (even the old one).


Would you please reconsider my patch ?


[1] :
https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Iup-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/iup-users

Re: [Iup-users] Use utf-8 source encoding rather than ISO8859-1

Reply via email to