URL: <https://savannah.gnu.org/bugs/?68255>
Summary: [troff] refactor `read_()` functions to consistently
return `bool` and store datum in argument
Group: GNU roff
Submitter: gbranden
Submitted: Sat 18 Apr 2026 03:03:09 PM UTC
Category: Core
Severity: 2 - Minor
Item Group: Refactoring
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Unlocked
Planned Release: None
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Sat 18 Apr 2026 03:03:09 PM UTC By: G. Branden Robinson <gbranden>
This is pretty close to already the case. Review and make as consistent as
possible.
grepping my working copy (which has seen some renames since my last push), we
see that most input stream scanning functions already return a `bool`
indicating whether the input was lexically analyzable.
$ git grep 'bool read_' src/roff/troff/
src/roff/troff/env.cpp:static bool read_hyphenation_exception_word(unsigned
char *word,
src/roff/troff/hvunits.h:extern bool read_vunits(vunits *, unsigned char si);
src/roff/troff/hvunits.h:extern bool read_hunits(hunits *, unsigned char si);
src/roff/troff/hvunits.h:extern bool read_vunits(vunits *, unsigned char si,
vunits prev_value);
src/roff/troff/hvunits.h:extern bool read_hunits(hunits *, unsigned char si,
hunits prev_value);
src/roff/troff/input.cpp:static bool read_delimited_measurement(units * /* n
*/,
src/roff/troff/input.cpp:static bool read_delimited_measurement(units * /* n
*/,
src/roff/troff/input.cpp:static bool read_line_rule_expression(units * /* res
*/,
src/roff/troff/input.cpp:static bool read_size(int *);
src/roff/troff/input.cpp:static bool read_delimited_measurement(units *n,
src/roff/troff/input.cpp:static bool read_delimited_measurement(units *n,
unsigned char si)
src/roff/troff/input.cpp:static bool read_line_rule_expression(units *n,
unsigned char si,
src/roff/troff/input.cpp:static bool read_size(int *x) // \s
src/roff/troff/node.cpp:static bool read_font_identifier(font_lookup_info
*finfo)
src/roff/troff/number.cpp:bool read_vunits(vunits *res, unsigned char si) //
TODO: grochar
src/roff/troff/number.cpp:bool read_hunits(hunits *res, unsigned char si) //
TODO: grochar
src/roff/troff/number.cpp:bool read_measurement(units *res,
src/roff/troff/number.cpp:bool read_integer(int *res)
src/roff/troff/number.cpp:bool read_vunits(vunits *res,
src/roff/troff/number.cpp:bool read_hunits(hunits *res,
src/roff/troff/number.cpp:bool read_measurement_crement(units *res,
src/roff/troff/number.cpp:bool read_integer_crement(int *res, int operand)
src/roff/troff/token.h:extern bool read_measurement(units * /* result */,
src/roff/troff/token.h:extern bool read_integer(int *result);
src/roff/troff/token.h:extern bool read_measurement_crement(units * /* result
*/,
src/roff/troff/token.h:extern bool read_integer_crement(int * /* result */,
A few guys don't return anything. That might make sense in a few cases, as
with the `rd` request handler. Maybe it can't fail. Annotate as
appropriate.
$ git grep 'void read_' src/roff/troff/
src/roff/troff/input.cpp:static void
read_drawing_command_color_arguments(token &);
src/roff/troff/input.cpp:void read_from_terminal_request() // .rd
src/roff/troff/input.cpp:void read_title_parts(node **part, hunits
*part_width)
src/roff/troff/input.cpp:static void
read_drawing_command_color_arguments(token &start)
src/roff/troff/node.cpp:static void
read_special_font_identifiers(special_font_list **sp)
src/roff/troff/token.h:extern void read_title_parts(node **part, hunits
*part_width);
There's only one prominent outlier.
$ git grep 'int read_' src/roff/troff/
src/roff/troff/input.cpp:static int read_character_in_copy_mode(node ** /* nd;
0 to discard */,
src/roff/troff/input.cpp:static int read_character_in_copy_mode(node **nd,
src/roff/troff/input.cpp:static unsigned int read_color_channel_value(const
char *scheme,
The first is a _very_ special case. My working copy has this annotation:
// In copy mode, we don't tokenize normally; characters on the input
// stream are typically read into the contents of an existing node (like
// a string or macro definition), or discarded. A handful of escape
// sequences (\n, etc.) interpolate as they do outside of copy mode.
//
// XXX: This is one of the places where the rubber meets the road in the
// "migrate GNU troff from reading unsigned chars to UTF-8" project,
// because it returns an `int` and therefore can encode `EOF`, which the
// rest of the code uses in a traditional C-idiomatic way.
//
// That idiom seems bad for us: reading a UTF-8 sequence adds a whole
// layer of additional state because situations like a UTF-8 sequence
// being invalid (e.g., possessing an overlength encoding), incomplete,
// or outside the encoding range can happen. Even if some gnulib module
// nicely wraps up and handles all that madness for us (and I think/hope
// it does), there are still going to be exceptional conditions that are
// impossible with a single-byte character encoding where all code point
// values are valid (for reading purposes--not necessarily to GNU
// troff). To be useful, gnulib (or whatever external UTF-8-chomping
// library) has to communicate error information up to the application.
//
// Due to the variety of exceptional conditions, we might want to throw
// and catch exceptions instead.
//
// Another place (_the_ other place?) is of course reading an input
// character _not_ in copy mode--in interpretation mode, if you will.
// Unfortunately that is done ad hoc wherever a lexical analysis
// function needs to pump the input stream. We might need a counterpart
// function, read_character(), or to make this that function, with an
// additional Boolean paramater with a default value of `false`.
So we'll deal with that later. That leaves this guy:
src/roff/troff/input.cpp:static unsigned int read_color_channel_value(const
char *scheme,
We should return the unsigned `int` in an argument and make sure we recognize
garbage in the color channel value, return `false` if it is encountered, and
update callers to recover appropriately.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?68255>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
