Re: [Iup-users] Request

Andrew Robinson Mon, 02 Dec 2019 05:43:00 -0800

PS -- This is a secret, so don't tell anybody listening: You can leave out
Unicode support in IUP, and still support Unicode. For example, without
Unicode support enabled, most compilers will translate GetCommandLine into
GetCommandLineA (the ANSI version), and with Unicode enabled most compilers
will translate GetCommandLine into GetCommandLineW (the UTF-16 version). If
your compiler won't let you bypass that, then use the "asm" keyword and call
it directly. You only need to do this is a few key places, and leave the rest
of the code alone.



On 2019-12-01 at 5:09 PM, Andrew Robinson <arobinso...@cox.net> wrote:
Antonio,

If you don't feel like having this discussion, let me know and I can save it
for another time...

To avoid confusion, let's get our terminology right: There is no such thing as
a Unicode strings type. Unicode is a standardized enumeration that is not used
anywhere because it is a 21-bit enumeration, which is not a nice neat integer
multiple of 8-bits. Instead there has to be standardized Unicode encodings
which are 8-bit multiples, such as UTF-8, UTF-16, UTF-32, and GB-18030.
Therefore when you say IUP uses "Unicode strings", that has no meaning (even
the C-language itself has no string data-type, but rather hijacks a pointer to
an undeclared array of unsigned chars). From what I can tell, IUP follows the
C-model and uses a hybrid zero-terminated array of bytes consisting of ANSI
and/or UTF-8.

This issue isn't only about fopen, there are 711 other functions in Windows
that do not natively support UTF-8 functions. Microsoft even said they will
not support UTF-8 encodings and using it can CRASH some functions if you do,
so they recommend not doing it. Why would you ignore Microsoft? Why use it in
ways it wasn't intended to be used and is unsupported?

You have a very difficult decision to make here, Antonio. I know because I've
been there and done that. You want IUP to be universal but in order to do
that, you need to write code that will compile for both Linux and Windows. The
problem is internationalization. Linux natively uses UTF-8 and Windows
natively uses UTF-16 and they do not go well together. So either you hobble
Windows together to support UTF-8 or you hobble Linux to support UTF-16. I
guarantee you it won't work, otherwise someone would have done it already. If
I were to pick sides (and I have had to pick sides) I would choose native
Windows and hobble UTF-16 in Linux because Linux represents less than 3% of
the entire market that IUP could ever appeal to. Without good
internationalization support, only Westerners will want to use IUP. You know
this because you've seen the complaints about this already, and it will only
get worse.

That's because no matter what you do, I will always have to do translations
between UTF-8 and UTF-16 and GB-18030 and God knows what else. I don't even
want to have to think about that unless I'm reading or writing text to a file,
otherwise I want one encoding standard, not three or four encoding standards.
It will make it confusing, error-prone, and almost useless.

So let's talk about ways to get IUP to support UTF-16 for Windows and UTF-8
for Linux, and we can start by letting everyone know that not everything needs
to be UTF-8 or UTF-16 in IUP. Internally IUP works just fine with ANSI or
ASCII text. It is only the textbox and file dialog I/O that need different
code. Oh, and the menu functions. Is there anything else? That isn't a lot, is
it?

And the file dialog should be simple -- just return an UTF-16 string when
Unicode is enabled and tell the programmers to use wfopen instead of fopen.
Since Linux doesn't support anything other than UTF-8, Linux can remain as is,
without any "Unicode" directive.

Also, since this does not involve code pages, that issue can be skipped
entirely.

Regards,
Andrew


On 2019-12-01 at 1:28 PM, Antonio Scuri <antonio.sc...@gmail.com> wrote:
  Yes, the problem is not the conversion itself.  


  IUP already use Unicode strings when setting and retrieving native element
strings. The problem is we can NOT change the IUP API without breaking
compatibility with several thousands of lines of code of existing
applications. To add a new API in parallel is something we don't have time to
do. So we are still going to use char* in our API for some time.  UTF-16 would
imply in using wchar* in everything.


  What we can do now is to provide some mechanism for the application to be
able to use the returned string in IupFileDlg in another control and in fopen.
Probably a different attribute will be necessary. Some applications I know are
simply using UTF8MODE_FILE=NO, so they can use it in fopen, and before
displaying it in another control, converting that string to UTF-8.


Best,
Scuri




Em dom., 1 de dez. de 2019 às 12:43, Andrew Robinson <arobinso...@cox.net>
escreveu:

Antonio,

It is a trivial thing to translate between UTF encodings in Windows using the
MultiByteToWideChar() function, but the problem is, there are 711 API
functions in Windows, none of which support UTF-8. They only directly support
ANSI or UTF-16. Read that quote from Microsoft I posted the other day, and
notice how they said that passing a UTF-8 encoded string WILL cause some API
functions to crash your application. The problem is, Microsoft has not
published a list of all of those unstable API functions BECAUSE MICROSOFT HAS
SAID IT WILL ONLY SUPPORT UTF-16 and ANSI, and only certain versions of
Windows 10 can support UTF-8.

Code should be not only be readable, maintainable, and modular, it should also
make sense, so if Microsoft choose UTF-16 as their Unicode standard of choice,
why complicate things by choosing a different standard? Adding 711
UTF-8-->UTF-16 and 711 UTF-16-->UTF-8 translations to those 711 native Windows
API functions would make it unreasonably complicated and clutter the code. The
only time translations should be an "issue" is when you need to translate text
files from and to a different UTF encoding, but like I said, that is a trivial
thing to do.

TIP: IUP is just a GUI framework and should not be involved in any UTF
standards battles. IUP should just stick to the native encoding of the
underlying OS (which is ANSI or UTF-16) and let the programmer be able to
seamlessly call native Windows Unicode compliant functions without any
additional and unnecessary overhead.

Regards,
Andrew


On 2019-11-30 at 11:21 PM, Pete Lomax via Iup-users
<iup-users@lists.sourceforge.net> wrote:
On Saturday, 30 November 2019, 18:49:55 GMT, Antonio Scuri
<antonio.sc...@gmail.com> wrote:

.... need conversion functions to/from UTF-8 and the filesystem encoding. Would
be nice to have a solution for that inside IUP, but for now we still don't
have one.

My own routines in Phix are 360 lines, a quick search yielded this:
https://www.montefusco.com/qs1rboostsrv/ConvertUTF.c which is 540 lines.
Not saying that's the very best, or the right licence, etc, but it should all
be reasonably straightforward??

Regards, Pete 


_______________________________________________
Iup-users mailing list
Iup-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iup-users

_______________________________________________
Iup-users mailing list
Iup-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iup-users

Re: [Iup-users] Request

Reply via email to