Eric Ackermann wrote:
> please find attached another test case (a shortened version of the
> example in the gettext proposal that Eric Blake linked). It uses the
> same mail.po and mail-utf8.po files that you provided earlier.
> When I compile and run it on Ubuntu 20.04 (Ubuntu GLIBC
> 2.31-0ubuntu9.2), for both .po files it prints "Empf?nger" in ASCII
> (converting the a-Umlaut into the question mark). This is probably
> related to the transliteration mechanism you described.

This demo.c example is not a good test case, because it does not
follow the advice to set at least the LC_MESSAGES and LC_CTYPE categories
of the locale. See
<https://www.gnu.org/software/gettext/manual/html_node/Triggering.html>
and <https://posix.rhansen.org/p/gettext_split> line 86.

What happens then is that the LC_CTYPE category of the locale is, by default,
set to "C", which implies "ASCII" encoding and no particular language or
territory. glibc's transliteration uses the language to determine the
transliteration to use. For example, it transliterates "å" to "aa" in a
Danish locale, but to "a" in an English locale. In the absence of a known
language, it falls back to "?" (like for the Chinese characters in my
previous mail).

> I conclude that the different sequence in which the
> gettext-functions are called causes this behavior which I would consider
> a bug.

No, there is no bug. The doc states that the LC_MESSAGES and LC_CTYPE
categories should be set, for gettext() to operate reasonably.

Bruno


Reply via email to