Update of bug#59442 (group groff):

                 Summary: [PATCH] groff.cpp: move soelim before preconv in
constructed  pipeline => [PATCH] groff.cpp: move soelim before preconv in
constructed pipeline

    _______________________________________________________

Follow-up Comment #5:

[comment #4 comment #4:]
> [comment #1 comment #1:]
> > 1. What happens if a .so(urced) file has a non-ASCII character
> > in its filename?
> 
> This seems more likely to work with the proposed change than without
> it.  Currently, with preconv preceding soelim, preconv will change the
> line ".so naïve_file" into ".so na\[u00EF]ve_file", guaranteed to
> fail.  If soelim comes first, it at least has a chance of succeeding.

Another thing I want to do is specialize the formatter's logic when
handling file name arguments given to requests.

Presently, GNU troff calls the same internal function to gather an
argument that is a file name as it does to gather a *roff identifier.

Maybe that made sense in 1990, but it doesn't today.  File names can
contain spaces and non-ASCII characters (in whatever encoding the file
system happens to support).

Since these arguments used mainly as-is, handed off to standard C
library functions like `fopen()`, I don't anticipate many problems here
(O Fortuna, seize my hostage).  The only exception to that I can think
of off the top of my head is the value of the `.F` register, which
interpolates a file name.  We will need some way to represent this such
things as output.  At first blush, it seems to me that we can
interpolate spaces as-is (if you want the argument quoted, do that
yourself in context), and any unprintable non-Basic Latin bytes in
groff's \[u00xx] notation.

I say "\[u00xx]" instead of "\[uXXXX]" because we have no way of knowing
what the file system's character encoding is.  Might be ISO 8859-1,
UTF-8, UTF-16BE/LE, or something else entirely.

What would be affected by this:

Requests:

cf
fp (when invoked with a 3rd argument)
hpf
hpfa
lf (when invoked with a 2nd argument)
mso
msoquiet
nx (when invoked with a 2nd argument)
open
opena
psbb
so
soquiet
trf

Escape sequences:

\O5 (but since this is mainly used internally to manage temporary files
    by grohtml, maybe lazily postponing this in hope that my Grand Plan
    to revise grohtml to no longer use a dedicated preprocessor is a
    better idea)


Registers:

.F

It occurs to me that I hadn't actually written up this idea with these
particulars before.  This should probably be a new ticket.

And probably step 1 would be a simple refactor to introduce file name
argument-gathering and -interpolating functions which initially behave
no differently than the status quote, but simply wrap existing logic for
identifier gathering and whatever one-off thing the `.F` interpolator
does.

Regards,
Branden



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59442>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/


Reply via email to