Hi,
On 2022-07-18 03:19, Povilas Kanapickas wrote:
Hi John,
On 2022-07-18 05:25, John Scott wrote:
The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
1"). However, from inspecting the code for sane_strstatus(), it appears
that it just returns ordinary string literals, which use whatever
encoding the compiler prescribes for narrow string literals and need not
be the same.
Agreed, going by the letter of standards this is indeed a problem.
So, what character encoding should I be assuming for strings coming from
sane_strstatus() as an application writer? One solution to this dilemma
is, since sane_strstatus() appears to only use characters from ASCII in
the strings, is to use UTF-8 string literals, like this:
u8"Hello, world"
This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.
I'm CC'ing Ralph for a second opinion of whether we can start requiring C11.
By the way, does the current assumption actually break in practice, that
is, are there compilers for which ASCII text will not encode to a subset
of ISO-8859-1?
If you can affirm that the specification needs to prevail, I can send a
merge request to adjust the string literals accordingly.
Let's wait until Ralph replies and then we can see how to proceed.
Thanks a lot for noticing this.
Regards,
Povilas
.
None of the suggestions that we have seen so far seem very portable, yet
this situation is indeed a problem.
Since UTF-8 is pretty much the de facto string representation these
days, would a better solution be to change the SANE spec. to specify UTF-8?
If the currently supported text strings are the same in UTF-8 and
ISO-8859-1 then there should be no practical fallout from the change.
What would the fallout of such a change be?
Would it make frontend support simpler?
Do any of our current frontends actually care?
Cheers,
Ralph