[moving this back to the thread where it belongs] On 1/2/24, hoh...@posteo.de <hoh...@posteo.de> wrote: > If gpic gets Ä (0xc3 0x84) it complains about 0x84. > If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4.
True, but irrelevant, because *in neither case will the character be interpreted the way you intend*. gpic will consider 0xc3 0x84 a valid Latin-1 character (LATIN CAPITAL LETTER A WITH TILDE) and an invalid character. gpic will consider 0xc3 0xa4 two valid Latin-1 characters (LATIN CAPITAL LETTER A WITH TILDE and CURRENCY SIGN). What you're trying to send to gpic in your two examples is LATIN CAPITAL LETTER A WITH DIAERESIS and LATIN SMALL LETTER A WITH DIAERESIS. But if those are sent as UTF-8 to gpic, it will not interpret them as you want. To get what you want, you need to convert your input to Latin-1, or run it through preconv before gpic. > ECMA-48 says for 0x84: Also irrelevant to groff, as it doesn't use ECMA-48. Groff tools (including gpic) take input in Latin-1, period. (Pure ASCII, being a subset of Latin-1, is also valid.) Any bytes that aren't Latin-1 characters are illegal input to all groff tools. The only exception is preconv, which recognizes various encodings and converts them to pure ASCII, with all non-ASCII characters being converted to groff escape sequences. > If you want to know why I ignore preconv, read the last mail.) I don't recall a previous message giving a reason for this, but if you don't use preconv (or convert input to Latin-1 by some means), you're not going to get what you want.