On 07/19/18 16:18, Andrew Robinson wrote:
Hi Theron,

How is a compiler that rejects bytes 128-255 in string literals
necessarily non-conforming?...Now, knowing that not all compilers
in practice do accept this cleanly.
As you yourself pointed out in an earlier message, "the compiler should not
try to reinterpret the byte sequences in any way". That is a truism
I was wrong to say that only 0-127 are "proper C".
Here is a relevant reference: https://en.cppreference.com/w/c/language/string_literal
In particular,

"1)  /character string literal/: The type of the literal ischar[], each character in 
the array is initialized from the next character ins-char-sequence  using the execution 
character set."

However,

"The encoding of character string literals (1) and wide string literals (5) is 
implementation-defined. For example, gcc selects them with the commandline options 
-fexec-charset and -fwide-exec-charset."

How I interpret this is that while it is not the compiler's job to do an exact copy of the bytes ("Raw" string literals are a C++ feature), it should get it right as long as the source encoding agrees with the encoding targeted by the compiler.  I have no reason to doubt that UTF-8 literals work just fine in Clang and GCC, or that ISO-8859-1 work in Microsoft, and I would be disappointed if these do not also support each other.

In the case of iup_str.c and iup_strmessage.c, this doesn't mean the existing C sources are okay as-is: two different encodings are used within one file.  Only one might be expected to work correctly at a time, and this depends on the compiler and on any command-line options.

  and since
the problem only exists in some compilers and not others, it would make more
sense to switch compilers rather switch code just so those certain compilers
that don't work with iup_str.c would suddenly start working.
I guess it works as-is, with no warnings, in GCC and MSVC, but it is not best for IUP portability (part of its purpose) to be restricted to these two compilers.  Keep in mind that other compilers are not necessarily wrong, in some cases the problem is greater strictness in enforcing source validity.

The strings themselves should not ever need editing to "make it work".
As long as the strings are already correctly encoded ISO8859-1 and
UTF-8
There is no such thing as "correctly encoded" C-strings, only valid or invalid
C-strings. Here are a few examples of some valid C-strings:
I meant to refer to the strings themselves, i.e. the 8-bit integer arrays stored at compile-time, not to the literals in the source. As is under discussion, there are various ways in the C language and in its various implementations to pack that byte array into the compiled library, but the array itself needs to be a valid encoded form of the intended text - this much didn't really need to be said, but it is all that I meant.

ASCII and ANSI are so yesterday, so why are they still hanging around causing
problems?
Following this line of reasoning, why are both UTF-8 and ISO-8859-1 needed in IUP source?  Shouldn't the project choose one for source, and convert to the other at runtime only for APIs (Microsoft?) which need it?

If a single C source file absolutely must generate code containing constants under both encodings, the options seem to be ASCII+escapes as a lowest-common-denominator, short of simple hexadecimal integer arrays which are entirely unreadable (except perhaps to a coder). However, I would be in agreement that this is far from an ideal solution.

Theron
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Iup-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/iup-users

Reply via email to