Update of bug#59442 (group groff):
Summary: [PATCH] groff.cpp: move soelim before preconv in
constructed pipeline => [PATCH] groff.cpp: move soelim before preconv in
constructed pipeline
_______________________________________________________
Follow-up Comment #5:
[comment #4 comment #4:]
> [comment #1 comment #1:]
> > 1. What happens if a .so(urced) file has a non-ASCII character
> > in its filename?
>
> This seems more likely to work with the proposed change than without
> it. Currently, with preconv preceding soelim, preconv will change the
> line ".so naïve_file" into ".so na\[u00EF]ve_file", guaranteed to
> fail. If soelim comes first, it at least has a chance of succeeding.
Another thing I want to do is specialize the formatter's logic when
handling file name arguments given to requests.
Presently, GNU troff calls the same internal function to gather an
argument that is a file name as it does to gather a *roff identifier.
Maybe that made sense in 1990, but it doesn't today. File names can
contain spaces and non-ASCII characters (in whatever encoding the file
system happens to support).
Since these arguments used mainly as-is, handed off to standard C
library functions like `fopen()`, I don't anticipate many problems here
(O Fortuna, seize my hostage). The only exception to that I can think
of off the top of my head is the value of the `.F` register, which
interpolates a file name. We will need some way to represent this such
things as output. At first blush, it seems to me that we can
interpolate spaces as-is (if you want the argument quoted, do that
yourself in context), and any unprintable non-Basic Latin bytes in
groff's \[u00xx] notation.
I say "\[u00xx]" instead of "\[uXXXX]" because we have no way of knowing
what the file system's character encoding is. Might be ISO 8859-1,
UTF-8, UTF-16BE/LE, or something else entirely.
What would be affected by this:
Requests:
cf
fp (when invoked with a 3rd argument)
hpf
hpfa
lf (when invoked with a 2nd argument)
mso
msoquiet
nx (when invoked with a 2nd argument)
open
opena
psbb
so
soquiet
trf
Escape sequences:
\O5 (but since this is mainly used internally to manage temporary files
by grohtml, maybe lazily postponing this in hope that my Grand Plan
to revise grohtml to no longer use a dedicated preprocessor is a
better idea)
Registers:
.F
It occurs to me that I hadn't actually written up this idea with these
particulars before. This should probably be a new ticket.
And probably step 1 would be a simple refactor to introduce file name
argument-gathering and -interpolating functions which initially behave
no differently than the status quote, but simply wrap existing logic for
identifier gathering and whatever one-off thing the `.F` interpolator
does.
Regards,
Branden
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59442>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/