[moving this back to the thread where it belongs]

On 1/2/24, hoh...@posteo.de <hoh...@posteo.de> wrote:
> If gpic gets Ä (0xc3 0x84) it complains about 0x84.
> If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4.

True, but irrelevant, because *in neither case will the character be
interpreted the way you intend*.

gpic will consider 0xc3 0x84 a valid Latin-1 character (LATIN CAPITAL
LETTER A WITH TILDE) and an invalid character.

gpic will consider 0xc3 0xa4 two valid Latin-1 characters (LATIN
CAPITAL LETTER A WITH TILDE and CURRENCY SIGN).

What you're trying to send to gpic in your two examples is LATIN
CAPITAL LETTER A WITH DIAERESIS and LATIN SMALL LETTER A WITH
DIAERESIS.  But if those are sent as UTF-8 to gpic, it will not
interpret them as you want.  To get what you want, you need to convert
your input to Latin-1, or run it through preconv before gpic.

> ECMA-48 says for 0x84:

Also irrelevant to groff, as it doesn't use ECMA-48.  Groff tools
(including gpic) take input in Latin-1, period.  (Pure ASCII, being a
subset of Latin-1, is also valid.)  Any bytes that aren't Latin-1
characters are illegal input to all groff tools.  The only exception
is preconv, which recognizes various encodings and converts them to
pure ASCII, with all non-ASCII characters being converted to groff
escape sequences.

> If you want to know why I ignore preconv, read the last mail.)

I don't recall a previous message giving a reason for this, but if you
don't use preconv (or convert input to Latin-1 by some means), you're
not going to get what you want.

Reply via email to