URL: <https://savannah.gnu.org/bugs/?66671>
Summary: explore removal of "cset.{h,cpp}" libgroff facility
Group: GNU roff
Submitter: gbranden
Submitted: Thu 16 Jan 2025 12:58:57 AM UTC
Category: Core
Severity: 1 - Wish
Item Group: Lint
Status: Postponed
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Thu 16 Jan 2025 12:58:57 AM UTC By: G. Branden Robinson <gbranden>
These class-based wrappers around the C standard library's `isalpha()`,
`isspace()` and friends make _groff_'s code a little less accessible to
experienced C/C++ programmers and don't **appear** to be delivering any
benefit.
I have a **guess** for why they're here.
They date back to the dawn of the repository (1991), and I'll bet they go all
the way back to 1989.
This is before the standard C library grew i18n/l10n support. That came with
ISO C95, a mostly overlooked revision of the language. Before that time, you
could only count on the standard C "ctype.h" functions to tell you what was
true of ASCII.
Possibly, Clark needed this for ISO 8859-1 support; GNU _troff_ assumed an
8-bit input coding coming out of the gate and ISO Latin-1 was known to be only
one of several possibilities.[1] However, as far as I know, a need for the
locale-specific ctype functions/classes never eventuated. Alternative input
character encodings were handled by issuing `trin` requests to the formatter
at startup (through macro files loaded via _troffrc_).
Nowadays, C requires that if you've called `setlocale()`, those functions will
tell you the truth applicable to your character encoding (and/or language).
Further, it's my intention to rip support for ISO 8859-1 per se _out_ of the
formatter and then, after a deprecation cycle, make it interpret UTF-8
instead. (A dead period of ASCII-only support is, I suspect, advisable to
reduce the amount of mojibake produced by users' old Latin-X documents.)
And another thing! This sort of work, even if it still needs to be done, is
not a concern specific to _groff_. It should be offloaded to _gnulib_ or
similar.
Postponing to the 1.25 release cycle.
I am **not** certain of my historical surmises above. They should be better
substantiated before this proceeds.
[1] And it needed to know which code points in the upper half of ISO 8859-1
were letters (_isalpha_()) because those could be candidate hyphenation points
whereas their complement would not be. The direction of GNU _troff_'s
development of hyphenation is to base hyphenation codes on _language_, not
_character encoding_.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?66671>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
