Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
[self-follow-up with correction]

At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote:
> This might be more accurately stated as:
> 
> 2) \X behaves like .device used to (in groff 1.23.0 and earlier).

[correction follows]
And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode
sequences in the output, but it seems useful so that authors of output
drivers (and supporting macro files for them) can keep their sanity.

[elaboration]

What I mean is that we can pass Unicode between "pdf.tmac" and the
output driver _today_.  Consider the following notional macro.

.de pdfmark2
. nop \!x X ps:exec [\\$* pdfmark2
..

(The open bracket has something to do with PostScript syntax, I think.)

...and it getting called by some other macro encoding the argument...

.de pdflink
.  ds pdf*input \\$*\"
.  encode pdf*input \" performs magic transformation, like "stringhex"
.  pdfmark2 \\*[pdf*input]
..

...and I have document using these.

.H 1 "This is my heading"
.pdflink "HI DERI "

This ultimately would show up in the output as something like this.

x X ps: exec [4849204445524920F09F9888 pdfmark2

Something pretty close to that works on the deri-gropdf-ng branch today,
as I understand it.

But my _suggestion_ would be that we support something more like this.

x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2

or this...

x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2

...or even this...

x X ps: exec [HI DERI \[u1F608] pdfmark2

These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32,
respectively.  The reuse of groff Unicode code point escape sequence
syntax is, I would hope, more helpful than confusing.

My concerns are that (1) people don't have to use two different escaping
conventions _within the formatter_ to get byte sequences to the output
driver, and (2) that driver-supporting macro file writers don't have to
handle a bunch of special cases in device control commands.

Those factors are what drive my proposal.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T00:07:21+, Deri wrote:
> On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> > Right.  Before I craft a lengthy response to this--did you see the
> > footnote?
> 
> Yes, sorry, it didn't help. I'm just comparing output now with output
> in 1.23.0 and what you claim you are doing is the reverse of what I'm
> seeing.

I haven't yet pushed anything implementing my (new) intentions,
reflected in the subject line.  I wanted to gather feedback first.

What happened was, I thought "the `device` request and `\X` escape
sequence should behave the same, modulo the usual differences in parsing
(delimitation vs. reading the rest of the line, the leading double quote
mechanism in request form, and so forth)".

Historically, that has never been the case in groff.

Here's (the meat of) the actual test case I recently wrote and pushed.

input='.nf
\X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]#
.device bogus1: req \%man-beast\[u1F63C]\\[u1F00] 
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
.ec @
@X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]#
.device bogus2: req @%man-beast@[u1F63C]@@[u1F00] 
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'

I know that looks hairy as hell.  I'm testing several things.

Here is what the output of that test looks like on groff 1.22.3 and
1.22.4.

x X bogus1: esc man-beast\[u1F00] -
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Observations of the above:

A.  When using `\X`, the escape sequences \%, \[u1F63c], \[aq], \[dq],
\[ga], \[ha], \[rs], \[ti] all get discarded.

B.  When you change the escape character and self-quote it in the
formatter, it comes out as-is in the device control command.  I
found this absurd, since there is no such thing as an escape
character in the device-independent output language, and whatever
escaping convention a device-specific control command needs to come
up with for things like, oh, expressing Unicode code points is
necessarily independent of a random *roff document's choice of
escape character anyway.

Here is what the test output looks like on groff 1.23.0.  It enabled a
few more characters to get rendered in PDF bookmarks.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -'"`^\~
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Here is what the test output looks like on groff Git HEAD.  It was my
first stab at solving the problem, the one I am now having partial
second thoughts about.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req man-beast\[u1F00] -'"`^\~
x X bogus2: esc man-beast\[u1F00] -'"`^\~
x X bogus2: req man-beast\[u1F00] -'"`^\~

I was briefly happy with this, but I started wondering what happens when
you interpolate any crazy old damned string inside a device control
command and I rapidly became uncomfortable.  Because `\X` does not read
its argument in copy mode, it can get exposed to "nodes" (and in groff
Git, `device` can too)--this is that old incomprehensible nemesis that
afflicted pdfmom users relentlessly before 1.23.0.[1][2][3][4][5][6]

can't transparently output node at top level

But the reason 1.23.0 doesn't throw these errors is because I hid them,
not because we fixed them.[7]

An aim of this proposal is to truly fix them.

I hope it will surprise no one to learn that I have recently also
updated our documentation regarding tokens, nodes, how these relate to
GNU troff's input processing, and related matters.

> I hope I don't elicit a too lengthy response.

I know such hope oft seems forlorn when talking to me...

> There are 3 logical possibilities for the list to decide:-
> 
> 1) .device behaves like \X.
> 
> This seems to be what Branden has done at the moment. Disadvantage is
> that as a by-product you can't send unicode to the output drivers
> using either method,

I'm not happy with this status quo, but this doesn't exactly mean you
"can't send Unicode to output drivers".  What you have to do is _decide
upon an encoding mechanism for them_.  That will be true no matter which
way we solve this.  But I think it's best if there is _one_ way (per
output driver, anyway), not two different ones depending on whether your
encoded Unicode sequence is passed via `device` or `\X`.  This stuff is
challenging enough to the user that that seems like gratuitous cruelty.

Unfortunately that _has been_ the status quo.

> and some escapes affect the text stream when the expectation is for
> things sent to the output driver should not affect text stream.

Right.  That is what alarmed me about reading `device` and `\X`
arguments in interpretation mode.

> 2) \X behaves like .device.
> 
> This is what Branden said was the intention. This allows 

Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread Deri
On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2024-01-19T21:16:54+, Deri wrote:
> > On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> > > Or: Should device control commands affect the environment?
> > > 
> > > I therefore propose to change this, and have the `\X` escape sequence
> > > read its argument in copy mode.  That will make it work like the
> > > `device` request in groff 1.23.0 and earlier[1].
> > 
> > This is not what I am seeing in current 'master/head'. [...]
> 
> Right.  Before I craft a lengthy response to this--did you see the
> footnote?

Hi Branden,

Yes, sorry, it didn't help. I'm just comparing output now with output in 
1.23.0 and what you claim you are doing is the reverse of what I'm seeing.

I hope I don't elicit a too lengthy response. There are 3 logical 
possibilities for the list to decide:-

1) .device behaves like \X.

This seems to be what Branden has done at the moment. Disadvantage is that as 
a by-product you can't send unicode to the output drivers using either method, 
and some escapes affect the text stream when the expectation is for things 
sent to the output driver should not affect text stream.

2) \X behaves like .device.

This is what Branden said was the intention. This allows pdf title (normally 
shown in the window header in a pdf viewer) to use unicode.

3) Leave things as they were prior to recent commits.

It will be interesting to hear from as many people as possible which they 
think is the best option. I definitely think we should not be making the use 
of unicode harder.

Cheers 

Deri

> 
> Regards,
> Branden







Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
Hi Deri,

At 2024-01-19T21:16:54+, Deri wrote:
> On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> > Or: Should device control commands affect the environment?
> > 
> > I therefore propose to change this, and have the `\X` escape sequence
> > read its argument in copy mode.  That will make it work like the
> > `device` request in groff 1.23.0 and earlier[1].
> 
> This is not what I am seeing in current 'master/head'. [...]

Right.  Before I craft a lengthy response to this--did you see the
footnote?

> > [1] Earlier this week I pushed a change to make `device` read _its_
> > argument in interpretation, not copy, mode.  My second thoughts
> > about that are what prompted this proposal.
> >
> > See  for background.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread Deri
On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> Or: Should device control commands affect the environment?
> 
...

> I therefore propose to change this, and have the `\X` escape sequence
> read its argument in copy mode.  That will make it work like the
> `device` request in groff 1.23.0 and earlier[1].

This is not what I am seeing in current 'master/head'. Using this as a test:-

===
.ds abc def
.br
black
\X'abc=\*[abc]\m[red]\(em\[u0431]'
red?\m[black]
.device device abc=\*[abc]\m[red]\(em\[u0431]
red?
===

With 1.23.0 it produces:-

x T ps
x res 72000 1 1
x init
p1
x font 5 TR
f5
s1
V12000
H72000
md
DFd
tblack
wh2500
V12000
H96160
mr 65535 0 0
x X abc=def
wh2500
tred?
wh5000
V12000
H120870
mr 0 0 0
x X device abc=def\m[red]\(em\[u0431]
tred?
n12000 0
x trailer
V792000
x stop

And the colour sequence of the words goes - black red black. You can also see 
the unicode character \[u0431] has been successfully passed to the 
postprocessor when using .device and also the \m[red] has not "leaked" into 
the text output stream but just passed to the postprocessor. The \X variant 
cleaned all the nodes before passing on what is left (and leaked red).

Now on current master which contains the changes on which you are asking us to 
comment, this is the result:-

x T ps
x res 72000 1 1
x init
p1
troff:X.trf:4: error: special character 'em' is invalid within a device 
control command
troff:X.trf:4: error: special character 'u0431' is invalid within a device 
control command
troff:X.trf:6: error: special character 'em' is invalid within a device 
control command
troff:X.trf:6: error: special character 'u0431' is invalid within a device 
control command
x font 5 TR
f5
s1
V12000
H72000
md
DFd
tblack
wh2500
V12000
H96160
mr 65535 0 0
x X abc=def
wh2500
tred?
wh5000
V12000
H120870
x X device abc=def
tred?
n12000 0
x trailer
V792000
x stop

Now we can see that both \X and .device are behaving the same way as \X used 
to (with the addition of a new error to document the facility to pass unicode 
characters, and others, has been withdrawn). Plus, both methods are now a 
leaky red!

You appear to have achieved the exact opposite of what you set out to achieve 
- "make it (\X) work like the device request in 1.23.0 and earlier". I think 
your instincts are correct, once you have completed your for loop the removal 
of unwanted nodes from a string will be simple, so it would not be necessary 
to rely on \X doing it for you. The device request currently operates as \X is 
documented in CSTR #54 so it makes sense to have our \X behave the same.

Usually it is better to preserve data rather than arbitrarily discard it so 
that it can't be recovered, so I agree with your desire to make \X behave like 
.device has always behaved, but possibly after your "for" request is ready so 
people have a simple way of choosing the current behaviour, i.e. removing 
nodes from a string or passing the string as a whole.

In https://savannah.gnu.org/bugs/?63074 which is titled "develop convention 
for encoding Unicode character sequences for passage to device control 
commands" shows you understand the necessity of having the ability to pass all 
unicode and other characters to postprocessors and are aware that .device was 
already capable of doing that, I have no objection to you extending this 
capability to \X if that is your wont, but the current state of master is the 
opposite.

Cheers

Deri







[bug #65180] [xditview, groff] warnings from -Wanalyzer-null-dereference -fanalyzer

2024-01-19 Thread Bjarni Ingi Gislason
Follow-up Comment #2, bug#65180 (group groff):

   The whole file is 23401 bytes.  The following is the part about
"groff/pipeline.c" and is in the attachment.


(file #55587)

___

Additional Item Attachment:

File name: groff.analyzer.pipeline.bugSize:10 KB
   



AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-edeaddc2ab68531921c20d76d8ba9389dff945c4.tar.gz


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




Re: [TUHS] Re: Original print of V7 manual? / My own version of troff

2024-01-19 Thread G. Branden Robinson
Hi Lennart,

At 2024-01-18T15:45:55+, Lennart Jablonka wrote:
> Quoth John Gardner:
> > Thanks for reminding me, Branden. :) I've yet to get V7 Unix working with
> > the latest release of SimH, so that's kind of stalled my ability to develop
> > something in K  C.
> 
> I went ahead and write a little C/A/T-to-later-troff-output converter in
> v7-friendly and C89-conforming C:
> 
> https://git.sr.ht/~humm/catdit

This is an exciting prospect but I can't actually see anything there.

I get an error.

"401 Unauthorized

You don't have the necessary permissions to access this page. Index"

> I’m not confident in having got the details of spacing right (Is that
> 55-unit offset when switching font sizes correct?)

I've never heard of this C/A/T feature/wart before.  Huh.

> and the character codes emitted are still those of the C/A/T,
> resulting in the wrong glyphs being used.

The codes should probably be remapped by default, with a command-line
option to restore the original ones.  I would of course recommend
writing out 'C' commands with groff special character names.

> I created the attached document like this:
> 
>   v7$ troff -t /usr/man/man0/title >title.cat
>   host$ catdit title.ps
> 
> (Where do the two blank pages at the end come from?)

Good question; we may need to rouse a C/A/T expert.

> PS: Branden, for rougher results, if you happen to have a Tektronix
> 4014 at hand (like the one emulated by XTerm), you can use that to
> look at v7 troff’s output.  Tell your SIMH to pass control bytes
> through and run troff -t | tc.

I'd love to, just please make your repo available to the public.  :)

Regards,
Branden


signature.asc
Description: PGP signature


[bug #65180] [xditview, groff] warnings from -Wanalyzer-null-dereference -fanalyzer

2024-01-19 Thread G. Branden Robinson
Update of bug#65180 (group groff):

Severity:  3 - Normal => 2 - Minor  
 Summary: make output with CFLAGS +=
-Wanalyzer-null-dereference  -fanalyzer => [xditview,groff] warnings from
-Wanalyzer-null-dereference  -fanalyzer

___

Follow-up Comment #1:

The attachment did not behave well for me.  The content was only ~24KiB.


Subject: output with CFLAGS += -Wanalyzer-null-dereference
 -fanalyzer

[...]
  CC   src/devices/xditview/gxditview-device.o
../src/devices/xditview/device.c: In function 'find_file':
../src/devices/xditview/device.c:468:5: warning: check of 'path' for NULL
after already dereferencing it [-Wanalyzer-deref-before-check]
  468 | strcat(path, env);
  | ^
  'find_file': events 1-4
|
|  466 |   *path = '\0';
|  |   ~~^~
|  | |
|  | (1) pointer 'path' is dereferenced here
|  467 |   if (env && *env) {
|  |  ~   
|  |  |
|  |  (2) following 'true' branch...
|  468 | strcat(path, env);
|  | ~
|  | |
|  | (3) ...to here
|  | (4) pointer 'path' is checked for NULL here but it was
already dereferenced at (1)
|
../src/devices/xditview/device.c:468:5: warning: check of 'path' for NULL
after already dereferencing it [-Wanalyzer-deref-before-check]
  468 | strcat(path, env);
  | ^
  'open_device_file': events 1-5
|
|  525 | FILE *open_device_file(const char *device_name, const char
*file_name,
|  |   ^~~~
|  |   |
|  |   (1) entry to 'open_device_file'
|..
|  531 |   buf = XtMalloc(3 + strlen(device_name) + 1 + strlen(file_name)
+ 1);
|  |  ~~~   ~
|  |  | |
|  |  | (3) ...to here
|  |  (2) following 'false' branch...
|  532 |   sprintf(buf, "dev%s/%s", device_name, file_name);
|  |   
|  |   |
|  |   (4) following 'false' branch (when 'buf' is non-NULL)...
|  |   (5) inlined call to 'sprintf' from 'open_device_file'
|
+--> 'sprintf': event 6
   |
   |/usr/include/x86_64-linux-gnu/bits/stdio2.h:30:10:
   |   30 |   return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL
- 1,
   |  | 
^~
   |  |  |
   |  |  (6) ...to here
   |   31 |   __glibc_objsize (__s),
__fmt,
   |  |  
~
   |   32 |   __va_arg_pack ());
   |  |   ~
   |
<--+
|
  'open_device_file': event 7
|
|../src/devices/xditview/device.c:533:8:
|  533 |   fp = find_file(buf, result);
|  |^~
|  ||
|  |(7) calling 'find_file' from 'open_device_file'
|
+--> 'find_file': events 8-12
   |
   |  454 | FILE *find_file(const char *file, char **result)
   |  |   ^
   |  |   |
   |  |   (8) entry to 'find_file'
   |..
   |  466 |   *path = '\0';
   |  |   
   |  | |
   |  | (9) pointer 'path' is dereferenced here
   |  467 |   if (env && *env) {
   |  |  ~ 
   |  |  |
   |  |  (10) following 'true' branch...
   |  468 | strcat(path, env);
   |  | ~
   |  | |
   |  | (11) ...to here
   |  | (12) pointer 'path' is checked for NULL here but it
was already dereferenced at (9)
   |
../src/devices/xditview/device.c:468:5: warning: check of 'path' for NULL
after already dereferencing it [-Wanalyzer-deref-before-check]
  468 | strcat(path, env);
  | ^
  'load_font': events 1-2
|
|  186 | DeviceFont *load_font(Device *dev, const char *name)
|  | ^
|  | |
|  | (1) entry to 'load_font'
|..
|  193 | fp = open_device_file(dev->name, name, _filename);
|  |  
|  |  |
|  |  (2) calling 'open_device_file' from 'load_font'
|
+--> 'open_device_file': events 3-7
   |
   |  525 | FILE