As an experiment, I tried building groff from source (from the git
repo) after converting all Latin-1 files to UTF8.

The build appeared to succeed, but there were about 9000 lines of
diagnostics about invalid input characters.

So obviously a naive approach isn't going to work.

Apparently groff doesn't do well with UTF-8 input. I'd like to
see that changed, but I don't know nearly enough about groff to
even start that work, or to speculate about whether it would be a
good idea.

Meanwhile, I suggest converting only files that are treated as
plain text (NEWS, ChangeLog.*, */README, etc.), just to make things
a bit easier for human readers.

Thoughts?


On Fri, Apr 10, 2026 at 11:53 AM Keith Thompson
<[email protected]> wrote:
>
> There are a number of Latin-1 (ISO 8859-1) files in the groff source 
> distribution.
>
> I suggest that it would be better for most or all of these to be converted to 
> UTF-8. On my system, these files do not display correctly, since I have my 
> system configured to use UTF-8 by default. I think most people are likely to 
> be in a similar situation.
>
> For example, line 56 of the NEWS file appears on my system as:
>
>    `WE` no longer re�nable it.  This change makes groff mm consistent
>
> Converting to UTF-8, it appears as:
>
>    `WE` no longer reënable it.  This change makes groff mm consistent
>
> If there's a consensus that these files should be converted to UTF-8, I 
> volunteer to submit a patch.
>
> I haven't closely examined all the relevant files, but for example I'm not 
> sure what to do about tmac/fr.tmac lines 160-174. (The same file also has a 
> Latin-1 accented letter in a comment on line 5.)
>
> -- Keith Thompson

Reply via email to