[bug #66671] explore removal of "cset.{h,cpp}" libgroff facility

G. Branden Robinson Wed, 15 Jan 2025 16:59:18 -0800

URL:
  <https://savannah.gnu.org/bugs/?66671>


                 Summary: explore removal of "cset.{h,cpp}" libgroff facility
                   Group: GNU roff
               Submitter: gbranden
               Submitted: Thu 16 Jan 2025 12:58:57 AM UTC
                Category: Core
                Severity: 1 - Wish
              Item Group: Lint
                  Status: Postponed
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Thu 16 Jan 2025 12:58:57 AM UTC By: G. Branden Robinson <gbranden>
These class-based wrappers around the C standard library's `isalpha()`,
`isspace()` and friends make _groff_'s code a little less accessible to
experienced C/C++ programmers and don't **appear** to be delivering any
benefit.

I have a **guess** for why they're here.

They date back to the dawn of the repository (1991), and I'll bet they go all
the way back to 1989.

This is before the standard C library grew i18n/l10n support.  That came with
ISO C95, a mostly overlooked revision of the language.  Before that time, you
could only count on the standard C "ctype.h" functions to tell you what was
true of ASCII.

Possibly, Clark needed this for ISO 8859-1 support; GNU _troff_ assumed an
8-bit input coding coming out of the gate and ISO Latin-1 was known to be only
one of several possibilities.[1]  However, as far as I know, a need for the
locale-specific ctype functions/classes never eventuated.  Alternative input
character encodings were handled by issuing `trin` requests to the formatter
at startup (through macro files loaded via _troffrc_). 

Nowadays, C requires that if you've called `setlocale()`, those functions will
tell you the truth applicable to your character encoding (and/or language).

Further, it's my intention to rip support for ISO 8859-1 per se _out_ of the
formatter and then, after a deprecation cycle, make it interpret UTF-8
instead.  (A dead period of ASCII-only support is, I suspect, advisable to
reduce the amount of mojibake produced by users' old Latin-X documents.)

And another thing!  This sort of work, even if it still needs to be done, is
not a concern specific to _groff_.  It should be offloaded to _gnulib_ or
similar.

Postponing to the 1.25 release cycle.

I am **not** certain of my historical surmises above.  They should be better
substantiated before this proceeds.

[1] And it needed to know which code points in the upper half of ISO 8859-1
were letters (_isalpha_()) because those could be candidate hyphenation points
whereas their complement would not be.  The direction of GNU _troff_'s
development of hyphenation is to base hyphenation codes on _language_, not
_character encoding_.







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66671>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #66671] explore removal of "cset.{h,cpp}" libgroff facility

Reply via email to