Re: Proposed: make \X read its argument in copy mode

2024-01-23 Thread G. Branden Robinson
Hi Deri,

At 2024-01-23T11:32:02+, Deri wrote:
> Just to be sure, can you confirm your intention is to return .device
> to its 1.23.0 state, and mirror that behaviour for \X,

Yes, that is my intention.

> so we will have no more red seepage.

Not sure what that is, so I can't promise it.  Does it have to do with
the phase of the moon?  :P

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-23 Thread Deri
On Tuesday, 23 January 2024 02:46:50 GMT G. Branden Robinson wrote:
> [self-follow-up]
> 
> > Or: Should device control commands affect the environment?
> > 
> > Recall the definition of the \X escape sequence from CSTR #54 (1992).
> > 
> > 10.7.  Transparent output.  The sequence \X'anything' copies anything
> > to the output, as a device control function of the form x X anything
> > (§22).  Escape sequences in anything are processed.
> 
> [...]
> 
> > I therefore propose to change this, and have the `\X` escape sequence
> > read its argument in copy mode.  That will make it work like the
> > `device` request in groff 1.23.0 and earlier[1].
> 
> It's looking like we _will_ be giving up something with this change:
> 
> The ability to use a newline as an escape sequence delimiter with the \X
> escape sequence.
> 
> I would argue that this change is of vanishingly small impact.
> 
> 1.  Likely few people knew you could use a newline as a delimiter with
> this escape sequence in the first place.
> 2.  You couldn't do that in DWB troff anyway.[1]
> 3.  The opposite problem is of greater interest to practical users:
> _embedding_ newlines inside device control commands.  \X didn't
> support that anyway, neither in DWB troff nor groff.[2]

Hi Branden,

Just to be sure, can you confirm your intention is to return .device to its 
1.23.0 state, and mirror that behaviour for \X, so we will have no more red 
seepage.

Cheers 

Deri






Re: Proposed: make \X read its argument in copy mode

2024-01-22 Thread G. Branden Robinson
[self-follow-up]

> Or: Should device control commands affect the environment?
>
> Recall the definition of the \X escape sequence from CSTR #54 (1992).
>
>   10.7.  Transparent output.  The sequence \X'anything' copies anything
>   to the output, as a device control function of the form x X anything
>   (§22).  Escape sequences in anything are processed.
[...]
> I therefore propose to change this, and have the `\X` escape sequence
> read its argument in copy mode.  That will make it work like the
> `device` request in groff 1.23.0 and earlier[1].

It's looking like we _will_ be giving up something with this change:

The ability to use a newline as an escape sequence delimiter with the \X
escape sequence.

I would argue that this change is of vanishingly small impact.

1.  Likely few people knew you could use a newline as a delimiter with
this escape sequence in the first place.
2.  You couldn't do that in DWB troff anyway.[1]
3.  The opposite problem is of greater interest to practical users:
_embedding_ newlines inside device control commands.  \X didn't
support that anyway, neither in DWB troff nor groff.[2]

But we have a bug report about this and a test case (that I just broke),
so I thought I'd mention it.[3]

(Also in NEWS if and when this lands, of course.)

I'm starting to see why the \? escape sequence uses itself as its own
terminating delimiter.  It is, as far as I can tell, the only escape
sequence that reads its argument in copy mode.  When you accept
delimiters that aren't necessarily characters but have been tokenized,
but the whole point of copy mode is to read characters _rather than_
tokens (with a few exceptions), things get interesting.

We haven't heard from Werner in a while.  I wonder if he's out there
getting a chuckle watching me crash into dusty corners and weird warts.

...or if he's trying hard to repress memories of the painful,
Lovecraftian horrors of underspecification I'm blundering into...

:-O

Regards,
Branden

[1]

$ printf '\\X#device: abc#\n.tm all done\n' | DWBHOME=. ./bin/troff \
  | grep -C1 'x X'
all done
V120
x X device: abc
n120 0

[2]

$ printf '\\X#ps: foo\nbar#\n.tm all done\n' | DWBHOME=. ./bin/troff | grep -C1 
'x X'
all done
V240
x X ps: foo
cb

$ printf '\\X#ps: foo\nbar#\n.tm all done\n' | groff -Z | grep -C1 'x X'
all done
DFd
x X ps: foo
wh2500

[3] https://savannah.gnu.org/bugs/?63011

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/tests/some_escapes_accept_newline_delimiters.sh?h=1.23.0


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T21:03:15+, Deri wrote:
> On Saturday, 20 January 2024 01:39:21 GMT G. Branden Robinson wrote:
[snip]
> > x X ps: exec [4849204445524920F09F9888 pdfmark2
> > 
> > Something pretty close to that works on the deri-gropdf-ng branch
> > today, as I understand it.
> 
> I'm afraid this is all wrong (or at least out of date, my private
> branch, which is rebased against a very recent HEAD, does not use
> stringhex as part of the interface with gropdf,

Ahh.  A day without wrongness is like Mordor without orcs.  

> it only uses it to build register names which need to include unicode
> characters with in the name).

Yes.  I may have a minor issue with that from a robustness perspective
but it doesn't have anything to do with \X or device control commands;
it's purely a macro programming level matter.  When I get some round
tuits I'll raise it in a new thread or a Savannah ticket.  And I'll
try to check my facts first.  ;-)

> In fact you know all this since you recently wrote:-

Plenty of people know what they don't know, and plenty more don't know
what they don't know, but I would claim that it takes real talent to not
know what you DO know.

> As an example, if this was in a file.mom:-
> 
> .HEADING 1 "Гуляйпольщина или Махновщина"
> 
> After running through preconv the resultant grout is:-
> 
> x X ps:exec [/Dest /pdf:bm24 /Title (8. \[u0413]\[u0443]\[u043B]\[u044F]\
> [u0439]\[u043F]\[u043E]\[u043B]\[u044C]\[u0449]\[u0438]\[u043D]\[u0430] \
> [u0438]\[u043B]\[u0438] \[u041C]\[u0430]\[u0445]\[u043D]\[u043E]\[u0432]\
> [u0449]\[u0438]\[u043D]\[u0430]) /Level 2 /OUT pdfmark
> 
> And the entry in the pdf looks like this:-
> 
> 99 0 obj << /Dest /pdf:bm24
> /Next 100 0 R
> /Parent 77 0 R
> /Prev 98 0 R
> /Title 
> (\376\377\0\70\0\56\0\40\4\23\4\103\4\73\4\117\4\71\4\77\4\76\4\73\4\114\4\111\4\70\4\75\4\60\0\40\4\70\4\73\4\70\0\40\4\34\4\60\4\105\4\75\4\76\4\62\4\111\4\70\4\75\4\60)
> >>
> endobj
> 
> The preconv unicodes have been converted to octal bytes with a UTF-16
> BOM on the front,

As a terminology stickler, I would not call these "preconv unicodes",
and IMO UTF-16 should usually be spelled with the endianess included...
But, yes, I take your point.

> and a pdf viewer will show the string with unicode characters in its
> bookmark panel. No stringhex involved, just passing preconv output
> straight to gropdf.

Cool.  I perceive that something I want is a unit test for this,
possibly a minimal mom(7) document containing the foregoing heading and
as little else as possible.  So I'll work on that while the
\X-copy-mode item percolates on the discussion table a while longer.

(Who, me, mix metaphors?)

> This is exactly the technique I am now using. Whatever preconv
> produces, ends up as a UTF-16 string. You can mix normal text with the
> preconv output, (and groff characters like \[em]), but as soon as any
> character in the string requires unicode the whole string is
> converted.

This seems like a reasonable approach, to keep from having to manage
state.  ("Are we in ASCII mode or octal mode?")

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T21:03:11+, Deri wrote:
> > can't transparently output node at top level
> > 
> > But the reason 1.23.0 doesn't throw these errors is because I hid
> > them, not because we fixed them.[7]
> 
> It might be worth clarifying what this caused this error to appear
> (before you suppressed it in 1.23.0).

Certainly.  I think we're so far down in the weeds relative to a daily
*roff user's experience that concrete examples are especially helpful.

> A particularly "fruity" bookmark appears in the mom example file
> mom-pdf.mom. It uses:-
> 
> .HEADING 1 \
> "Comparison of \-Tps\*[FU4]/\*[FU2]\-mpdfmark with \-Tpdf\*[FU4]/\*[FU2]\-mom
> 
> Which after expansion becomes this:-
> 
> 7. Comparison of \-Tps\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-Tpdf\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mom
> 
> And this passed to .pdfbookmark!

Mmmm, fragrant!  Hint of citrus with an overpowering bouquet of durian!

> In the version of pdf.tmac used until now, this monstrous string is
> run through .asciify to produce:-
> 
> 7. Comparison of Tps/mpdfmark with Tpdf/mom
> 
> You can see that all the "\-" are missing, .asciify left them as
> nodes, and each of them would elicit the error.

Yes.  Not the most helpful behavior.  Some day I'd like to kill
`asciify`, or move it into a "string.tmac" file along with `length`,
`stringup`, `stringdown`, and so forth.  My notional string iterator
(`for`) should make that straightforward.

> So under 1.22.4 this is what the overview bookmark in the pdf looked
> like:-
> 
> 96 0 obj
> <<
> /Dest /pdf:bm23
> /Parent 93 0 R
> /Title (7. Comparison of Tps/mpdfmark with Tpdf/mom)
> /Prev 109 0 R 
> >>
> endobj
> 
> Obviously, using .asciify is not the answer, particularly since each
> unicode character (\[u]) is a node which can't be asciified, so
> gets dropped.

Right.  `asciify` promises things it can't deliver, unless you're
already a major expert and manage your expectations.

> So in the latest version of pdf.tmac, not incorporated by Branden yet,
> the use of asciify has been dropped and the complete, raw, string, is
> passed to the output driver, so it becomes gropdf's job to make sense
> of the bookmark. The grout output looks like:-
> 
> x X ps:exec [/Dest /pdf:bm23 /Title (7. Comparison of \-Tps\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-
> Tpdf\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-
> mom) /Level 2 /OUT pdfmark

Pretty wild stuff.  I wonder what we can do to drop all the stuff that
the device control command won't have any use for _before_ packing it in
there.  Might need a macro that iterates through a string and drops all
nodes from it.  That's going to need 2 of the new features I have in
mind.

> But when gropdf writes the pdf it contains:-
> 
> 96 0 oj << /Dest /pdf:bm23
> /Parent 75 0 R 
> /Prev 91 0 R 
> /Title (7. Comparison of -Tps/-mpdfmark with -Tpdf/-mom)
> >>
> endobj

It appears that you're doing a lot of cleanup work in gropdf that I'd
prefer you didn't have to do.

> Which you can see is a more accurate rendition of what the bookmark
> should be. 

Very much so.

> The new pdf.tmac with the now released gropdf successfully handles all
> unicode (\[u]), groff named glyphs (i.e. \[em] or \(em), and even
> \N'233' type, when they are passed to the output driver.  This means
> that passing unicode in device controls is not an issue at all, no
> need to invent a new way, just using the well established convention
> of using \[u] for the unicode characters, which preconv provides.

I didn't realize that.  This is great news!  I didn't want to _make_ you
handle the \[u] convention, not realizing you had already done the
work to support it.

It sounds like we're reading from the same hymnal on this issue.

The important difference with respect to the Subject: line is that, _if_
someone was using \X to construct these device control commands, then
where formerly they would be saying something like (please excuse my
pidgin PDF)

\X'ps: exec [/Dest /pdf:bm23 /Author Ephraim Bar-B\\[u0065_0301]cue'

...to pass through the Unicode composite character, if the community
ratifies (or silently assents to) my proposal to make \X read its
argument in copy mode, the extra escape character will stop being
necessary, just as it would not be necessary in a string or macro
definition (which are also read in copy mode).

\X'ps: exec [/Dest /pdf:bm23 /Author Ephraim Bar-B\[u0065_0301]cue'

I reckon the affected audience here is small, and possibly restricted to
participants in this thread.  But I of course will NEWS item it.

Still, I'll give people a bit longer to comment and opine.  If I get
antsy I can always push to a branch.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread Deri
On Saturday, 20 January 2024 00:56:34 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2024-01-20T00:07:21+, Deri wrote:
> > On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> > > Right.  Before I craft a lengthy response to this--did you see the
> > > footnote?
> > 
> > Yes, sorry, it didn't help. I'm just comparing output now with output
> > in 1.23.0 and what you claim you are doing is the reverse of what I'm
> > seeing.
> 
> I haven't yet pushed anything implementing my (new) intentions,
> reflected in the subject line.  I wanted to gather feedback first.
> 
> What happened was, I thought "the `device` request and `\X` escape
> sequence should behave the same, modulo the usual differences in parsing
> (delimitation vs. reading the rest of the line, the leading double quote
> mechanism in request form, and so forth)".
> 
> Historically, that has never been the case in groff.
> 
> Here's (the meat of) the actual test case I recently wrote and pushed.
> 
> input='.nf
> \X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00]
> -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]# .device bogus1: req
> \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti] .ec @
> @X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00]
> -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]# .device bogus2: req
> @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'
> 
> I know that looks hairy as hell.  I'm testing several things.
> 
> Here is what the output of that test looks like on groff 1.22.3 and
> 1.22.4.
> 
> x X bogus1: esc man-beast\[u1F00] -
> x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
> x X bogus2: esc man-beast@[u1F00] -
> x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]
> 
> Observations of the above:
> 
> A.  When using `\X`, the escape sequences \%, \[u1F63c], \[aq], \[dq],
> \[ga], \[ha], \[rs], \[ti] all get discarded.
> 
> B.  When you change the escape character and self-quote it in the
> formatter, it comes out as-is in the device control command.  I
> found this absurd, since there is no such thing as an escape
> character in the device-independent output language, and whatever
> escaping convention a device-specific control command needs to come
> up with for things like, oh, expressing Unicode code points is
> necessarily independent of a random *roff document's choice of
> escape character anyway.
> 
> Here is what the test output looks like on groff 1.23.0.  It enabled a
> few more characters to get rendered in PDF bookmarks.
> 
> x X bogus1: esc man-beast\[u1F00] -'"`^\~
> x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
> x X bogus2: esc man-beast@[u1F00] -'"`^\~
> x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]
> 
> Here is what the test output looks like on groff Git HEAD.  It was my
> first stab at solving the problem, the one I am now having partial
> second thoughts about.
> 
> x X bogus1: esc man-beast\[u1F00] -'"`^\~
> x X bogus1: req man-beast\[u1F00] -'"`^\~
> x X bogus2: esc man-beast\[u1F00] -'"`^\~
> x X bogus2: req man-beast\[u1F00] -'"`^\~
> 
> I was briefly happy with this, but I started wondering what happens when
> you interpolate any crazy old damned string inside a device control
> command and I rapidly became uncomfortable.  Because `\X` does not read
> its argument in copy mode, it can get exposed to "nodes" (and in groff
> Git, `device` can too)--this is that old incomprehensible nemesis that
> afflicted pdfmom users relentlessly before 1.23.0.[1][2][3][4][5][6]
> 
>   can't transparently output node at top level
> 
> But the reason 1.23.0 doesn't throw these errors is because I hid them,
> not because we fixed them.[7]

Hi Branden,

It might be worth clarifying what this caused this error to appear (before you 
suppressed it in 1.23.0). A particularly "fruity" bookmark appears in the mom 
example file mom-pdf.mom. It uses:-

.HEADING 1 \
"Comparison of \-Tps\*[FU4]/\*[FU2]\-mpdfmark with \-Tpdf\*[FU4]/\*[FU2]\-mom

Which after expansion becomes this:-

7. Comparison of \-Tps\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/
\E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-Tpdf\h'(\En[.ps]u/
\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mom

And this passed to .pdfbookmark! In the version of pdf.tmac used until now, 
this monstrous string is run through .asciify to produce:-

7. Comparison of Tps/mpdfmark with Tpdf/mom

You can see that all the "\-" are missing, .asciify left them as nodes, and 
each of them would elicit the error. So under 1.22.4 this is what the overview 
bookmark in the pdf looked like:-

96 0 obj
<<
/Dest /pdf:bm23
/Parent 93 0 R
/Title (7. Comparison of Tps/mpdfmark with Tpdf/mom)
/Prev 109 0 R 
>>
endobj

Obviously, using .asciify is not the answer, particularly since each unicode 
character (\[u]) is a node which can't be asciified, so gets dropped. So 
in the latest version of pdf.tmac, not incorporated by Branden 

Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread Deri
On Saturday, 20 January 2024 01:39:21 GMT G. Branden Robinson wrote:
> [self-follow-up with correction]
> 
> At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote:
> > This might be more accurately stated as:
> > 
> > 2) \X behaves like .device used to (in groff 1.23.0 and earlier).
> 
> [correction follows]
> And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode
> sequences in the output, but it seems useful so that authors of output
> drivers (and supporting macro files for them) can keep their sanity.
> 
> [elaboration]
> 
> What I mean is that we can pass Unicode between "pdf.tmac" and the
> output driver _today_.  Consider the following notional macro.
> 
> .de pdfmark2
> . nop \!x X ps:exec [\\$* pdfmark2
> ..
> 
> (The open bracket has something to do with PostScript syntax, I think.)
> 
> ...and it getting called by some other macro encoding the argument...
> 
> .de pdflink
> .  ds pdf*input \\$*\"
> .  encode pdf*input \" performs magic transformation, like "stringhex"
> .  pdfmark2 \\*[pdf*input]
> ..
> 
> ...and I have document using these.
> 
> .H 1 "This is my heading"
> .pdflink "HI DERI "
> 
> This ultimately would show up in the output as something like this.
> 
> x X ps: exec [4849204445524920F09F9888 pdfmark2
> 
> Something pretty close to that works on the deri-gropdf-ng branch today,
> as I understand it.

Hi Branden,

I'm afraid this is all wrong (or at least out of date, my private branch, 
which is rebased against a very recent HEAD, does not use stringhex as part of 
the interface with gropdf, it only uses it to build register names which need 
to include unicode characters with in the name). In fact you know all this 
since you recently wrote:-

"Deri's right that his `stringhex` solution, and the underlying problem it
solves, aren't fundamentally about how the formatter talks to the device
driver (though that is ultimately a necessary step)", the bit in brackets is 
wrong.

As an example, if this was in a file.mom:-

.HEADING 1 "Гуляйпольщина или Махновщина"

After running through preconv the resultant grout is:-

x X ps:exec [/Dest /pdf:bm24 /Title (8. \[u0413]\[u0443]\[u043B]\[u044F]\
[u0439]\[u043F]\[u043E]\[u043B]\[u044C]\[u0449]\[u0438]\[u043D]\[u0430] \
[u0438]\[u043B]\[u0438] \[u041C]\[u0430]\[u0445]\[u043D]\[u043E]\[u0432]\
[u0449]\[u0438]\[u043D]\[u0430]) /Level 2 /OUT pdfmark

And the entry in the pdf looks like this:-

99 0 obj << /Dest /pdf:bm24
/Next 100 0 R
/Parent 77 0 R
/Prev 98 0 R
/Title 
(\376\377\0\70\0\56\0\40\4\23\4\103\4\73\4\117\4\71\4\77\4\76\4\73\4\114\4\111\4\70\4\75\4\60\0\40\4\70\4\73\4\70\0\40\4\34\4\60\4\105\4\75\4\76\4\62\4\111\4\70\4\75\4\60)
>>
endobj

The preconv unicodes have been converted to octal bytes with a UTF-16 BOM on 
the front, and a pdf viewer will show the string with unicode characters in 
its bookmark panel. No stringhex involved, just passing preconv output 
straight to gropdf.

> But my _suggestion_ would be that we support something more like this.
> 
> x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2
> 
> or this...
> 
> x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2
> 
> ...or even this...
> 
> x X ps: exec [HI DERI \[u1F608] pdfmark2
> 
> These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32,
> respectively.  The reuse of groff Unicode code point escape sequence
> syntax is, I would hope, more helpful than confusing.

This is exactly the technique I am now using. Whatever preconv produces, ends 
up as a UTF-16 string. You can mix normal text with the preconv output, (and 
groff characters like \[em]), but as soon as any character in the string 
requires unicode the whole string is converted.

Cheers

Deri
> My concerns are that (1) people don't have to use two different escaping
> conventions _within the formatter_ to get byte sequences to the output
> driver, and (2) that driver-supporting macro file writers don't have to
> handle a bunch of special cases in device control commands.
> 
> Those factors are what drive my proposal.
> 
> Regards,
> Branden








Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
[self-follow-up with correction]

At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote:
> This might be more accurately stated as:
> 
> 2) \X behaves like .device used to (in groff 1.23.0 and earlier).

[correction follows]
And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode
sequences in the output, but it seems useful so that authors of output
drivers (and supporting macro files for them) can keep their sanity.

[elaboration]

What I mean is that we can pass Unicode between "pdf.tmac" and the
output driver _today_.  Consider the following notional macro.

.de pdfmark2
. nop \!x X ps:exec [\\$* pdfmark2
..

(The open bracket has something to do with PostScript syntax, I think.)

...and it getting called by some other macro encoding the argument...

.de pdflink
.  ds pdf*input \\$*\"
.  encode pdf*input \" performs magic transformation, like "stringhex"
.  pdfmark2 \\*[pdf*input]
..

...and I have document using these.

.H 1 "This is my heading"
.pdflink "HI DERI "

This ultimately would show up in the output as something like this.

x X ps: exec [4849204445524920F09F9888 pdfmark2

Something pretty close to that works on the deri-gropdf-ng branch today,
as I understand it.

But my _suggestion_ would be that we support something more like this.

x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2

or this...

x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2

...or even this...

x X ps: exec [HI DERI \[u1F608] pdfmark2

These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32,
respectively.  The reuse of groff Unicode code point escape sequence
syntax is, I would hope, more helpful than confusing.

My concerns are that (1) people don't have to use two different escaping
conventions _within the formatter_ to get byte sequences to the output
driver, and (2) that driver-supporting macro file writers don't have to
handle a bunch of special cases in device control commands.

Those factors are what drive my proposal.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T00:07:21+, Deri wrote:
> On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> > Right.  Before I craft a lengthy response to this--did you see the
> > footnote?
> 
> Yes, sorry, it didn't help. I'm just comparing output now with output
> in 1.23.0 and what you claim you are doing is the reverse of what I'm
> seeing.

I haven't yet pushed anything implementing my (new) intentions,
reflected in the subject line.  I wanted to gather feedback first.

What happened was, I thought "the `device` request and `\X` escape
sequence should behave the same, modulo the usual differences in parsing
(delimitation vs. reading the rest of the line, the leading double quote
mechanism in request form, and so forth)".

Historically, that has never been the case in groff.

Here's (the meat of) the actual test case I recently wrote and pushed.

input='.nf
\X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]#
.device bogus1: req \%man-beast\[u1F63C]\\[u1F00] 
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
.ec @
@X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]#
.device bogus2: req @%man-beast@[u1F63C]@@[u1F00] 
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'

I know that looks hairy as hell.  I'm testing several things.

Here is what the output of that test looks like on groff 1.22.3 and
1.22.4.

x X bogus1: esc man-beast\[u1F00] -
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Observations of the above:

A.  When using `\X`, the escape sequences \%, \[u1F63c], \[aq], \[dq],
\[ga], \[ha], \[rs], \[ti] all get discarded.

B.  When you change the escape character and self-quote it in the
formatter, it comes out as-is in the device control command.  I
found this absurd, since there is no such thing as an escape
character in the device-independent output language, and whatever
escaping convention a device-specific control command needs to come
up with for things like, oh, expressing Unicode code points is
necessarily independent of a random *roff document's choice of
escape character anyway.

Here is what the test output looks like on groff 1.23.0.  It enabled a
few more characters to get rendered in PDF bookmarks.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -'"`^\~
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Here is what the test output looks like on groff Git HEAD.  It was my
first stab at solving the problem, the one I am now having partial
second thoughts about.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req man-beast\[u1F00] -'"`^\~
x X bogus2: esc man-beast\[u1F00] -'"`^\~
x X bogus2: req man-beast\[u1F00] -'"`^\~

I was briefly happy with this, but I started wondering what happens when
you interpolate any crazy old damned string inside a device control
command and I rapidly became uncomfortable.  Because `\X` does not read
its argument in copy mode, it can get exposed to "nodes" (and in groff
Git, `device` can too)--this is that old incomprehensible nemesis that
afflicted pdfmom users relentlessly before 1.23.0.[1][2][3][4][5][6]

can't transparently output node at top level

But the reason 1.23.0 doesn't throw these errors is because I hid them,
not because we fixed them.[7]

An aim of this proposal is to truly fix them.

I hope it will surprise no one to learn that I have recently also
updated our documentation regarding tokens, nodes, how these relate to
GNU troff's input processing, and related matters.

> I hope I don't elicit a too lengthy response.

I know such hope oft seems forlorn when talking to me...

> There are 3 logical possibilities for the list to decide:-
> 
> 1) .device behaves like \X.
> 
> This seems to be what Branden has done at the moment. Disadvantage is
> that as a by-product you can't send unicode to the output drivers
> using either method,

I'm not happy with this status quo, but this doesn't exactly mean you
"can't send Unicode to output drivers".  What you have to do is _decide
upon an encoding mechanism for them_.  That will be true no matter which
way we solve this.  But I think it's best if there is _one_ way (per
output driver, anyway), not two different ones depending on whether your
encoded Unicode sequence is passed via `device` or `\X`.  This stuff is
challenging enough to the user that that seems like gratuitous cruelty.

Unfortunately that _has been_ the status quo.

> and some escapes affect the text stream when the expectation is for
> things sent to the output driver should not affect text stream.

Right.  That is what alarmed me about reading `device` and `\X`
arguments in interpretation mode.

> 2) \X behaves like .device.
> 
> This is what Branden said was the intention. This allows 

Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread Deri
On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2024-01-19T21:16:54+, Deri wrote:
> > On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> > > Or: Should device control commands affect the environment?
> > > 
> > > I therefore propose to change this, and have the `\X` escape sequence
> > > read its argument in copy mode.  That will make it work like the
> > > `device` request in groff 1.23.0 and earlier[1].
> > 
> > This is not what I am seeing in current 'master/head'. [...]
> 
> Right.  Before I craft a lengthy response to this--did you see the
> footnote?

Hi Branden,

Yes, sorry, it didn't help. I'm just comparing output now with output in 
1.23.0 and what you claim you are doing is the reverse of what I'm seeing.

I hope I don't elicit a too lengthy response. There are 3 logical 
possibilities for the list to decide:-

1) .device behaves like \X.

This seems to be what Branden has done at the moment. Disadvantage is that as 
a by-product you can't send unicode to the output drivers using either method, 
and some escapes affect the text stream when the expectation is for things 
sent to the output driver should not affect text stream.

2) \X behaves like .device.

This is what Branden said was the intention. This allows pdf title (normally 
shown in the window header in a pdf viewer) to use unicode.

3) Leave things as they were prior to recent commits.

It will be interesting to hear from as many people as possible which they 
think is the best option. I definitely think we should not be making the use 
of unicode harder.

Cheers 

Deri

> 
> Regards,
> Branden







Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread G. Branden Robinson
Hi Deri,

At 2024-01-19T21:16:54+, Deri wrote:
> On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> > Or: Should device control commands affect the environment?
> > 
> > I therefore propose to change this, and have the `\X` escape sequence
> > read its argument in copy mode.  That will make it work like the
> > `device` request in groff 1.23.0 and earlier[1].
> 
> This is not what I am seeing in current 'master/head'. [...]

Right.  Before I craft a lengthy response to this--did you see the
footnote?

> > [1] Earlier this week I pushed a change to make `device` read _its_
> > argument in interpretation, not copy, mode.  My second thoughts
> > about that are what prompted this proposal.
> >
> > See  for background.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-19 Thread Deri
On Tuesday, 16 January 2024 19:22:48 GMT G. Branden Robinson wrote:
> Or: Should device control commands affect the environment?
> 
...

> I therefore propose to change this, and have the `\X` escape sequence
> read its argument in copy mode.  That will make it work like the
> `device` request in groff 1.23.0 and earlier[1].

This is not what I am seeing in current 'master/head'. Using this as a test:-

===
.ds abc def
.br
black
\X'abc=\*[abc]\m[red]\(em\[u0431]'
red?\m[black]
.device device abc=\*[abc]\m[red]\(em\[u0431]
red?
===

With 1.23.0 it produces:-

x T ps
x res 72000 1 1
x init
p1
x font 5 TR
f5
s1
V12000
H72000
md
DFd
tblack
wh2500
V12000
H96160
mr 65535 0 0
x X abc=def
wh2500
tred?
wh5000
V12000
H120870
mr 0 0 0
x X device abc=def\m[red]\(em\[u0431]
tred?
n12000 0
x trailer
V792000
x stop

And the colour sequence of the words goes - black red black. You can also see 
the unicode character \[u0431] has been successfully passed to the 
postprocessor when using .device and also the \m[red] has not "leaked" into 
the text output stream but just passed to the postprocessor. The \X variant 
cleaned all the nodes before passing on what is left (and leaked red).

Now on current master which contains the changes on which you are asking us to 
comment, this is the result:-

x T ps
x res 72000 1 1
x init
p1
troff:X.trf:4: error: special character 'em' is invalid within a device 
control command
troff:X.trf:4: error: special character 'u0431' is invalid within a device 
control command
troff:X.trf:6: error: special character 'em' is invalid within a device 
control command
troff:X.trf:6: error: special character 'u0431' is invalid within a device 
control command
x font 5 TR
f5
s1
V12000
H72000
md
DFd
tblack
wh2500
V12000
H96160
mr 65535 0 0
x X abc=def
wh2500
tred?
wh5000
V12000
H120870
x X device abc=def
tred?
n12000 0
x trailer
V792000
x stop

Now we can see that both \X and .device are behaving the same way as \X used 
to (with the addition of a new error to document the facility to pass unicode 
characters, and others, has been withdrawn). Plus, both methods are now a 
leaky red!

You appear to have achieved the exact opposite of what you set out to achieve 
- "make it (\X) work like the device request in 1.23.0 and earlier". I think 
your instincts are correct, once you have completed your for loop the removal 
of unwanted nodes from a string will be simple, so it would not be necessary 
to rely on \X doing it for you. The device request currently operates as \X is 
documented in CSTR #54 so it makes sense to have our \X behave the same.

Usually it is better to preserve data rather than arbitrarily discard it so 
that it can't be recovered, so I agree with your desire to make \X behave like 
.device has always behaved, but possibly after your "for" request is ready so 
people have a simple way of choosing the current behaviour, i.e. removing 
nodes from a string or passing the string as a whole.

In https://savannah.gnu.org/bugs/?63074 which is titled "develop convention 
for encoding Unicode character sequences for passage to device control 
commands" shows you understand the necessity of having the ability to pass all 
unicode and other characters to postprocessors and are aware that .device was 
already capable of doing that, I have no objection to you extending this 
capability to \X if that is your wont, but the current state of master is the 
opposite.

Cheers

Deri







Re: Proposed: make \X read its argument in copy mode

2024-01-17 Thread John Gardner
>
> This assumes you know both the desired font and the desired colour, which
> might be defined at other places in the document and not under your control.


Yeah, I know. I was trying to gauge how Groff's escape sequences might
benefit an \X'…'  sequence, and the PostScript I gave was a
contrived—albeit functional—example of interpolating the current font-size.

(Also, PostScript and grops(1) expect to work in two very different
coordinate systems, as the former's origin starts in the bottom-left corner
of the page).

On Thu, 18 Jan 2024 at 04:26, Tadziu Hoffmann 
wrote:

>
> > > \fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP
>
> > \X'ps: exec 1.0 0 0 setrgbcolor /Times-Bold findfont \n[.s] scalefont
> setfont (Text) show'
>
> This assumes you know both the desired font and the desired
> color, which might be defined at other places in the document
> and not under your control.  Thus, unless you need multiple
> colors/fonts/sizes within the device code, it is probably more
> practical to set theses outside, as in Branden's original sketch.
>
> Here is a possibly useful example:
>
>   .defcolor my-outline-color rgb 0.9 0 0.7
>   .fp 4 BI LinLibertineOBI
>   .\" 
>   .de outline
>   \Z'\N'32''\X'ps: exec \\n(.s 0.01 mul setlinewidth (\\$1) true charpath
> stroke'\h'\w'\\$1'u'
>   ..
>   .\" 
>   Here is some
>   .gcolor my-outline-color
>   .ft BI
>   .outline outlined\/
>   .gcolor
>   .ft
>   text.
>
> (Note that this code is not optimal, in particular because
> grops does not set the font unless it is outputting something,
> necessitating the hack of printing an explicit space with
> \N'32' in order to get grops to set the desired font.)
>
>
>


Re: Proposed: make \X read its argument in copy mode

2024-01-17 Thread Tadziu Hoffmann

> > \fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP

> \X'ps: exec 1.0 0 0 setrgbcolor /Times-Bold findfont \n[.s] scalefont setfont 
> (Text) show'

This assumes you know both the desired font and the desired
color, which might be defined at other places in the document
and not under your control.  Thus, unless you need multiple
colors/fonts/sizes within the device code, it is probably more
practical to set theses outside, as in Branden's original sketch.

Here is a possibly useful example:

  .defcolor my-outline-color rgb 0.9 0 0.7
  .fp 4 BI LinLibertineOBI
  .\" 
  .de outline
  \Z'\N'32''\X'ps: exec \\n(.s 0.01 mul setlinewidth (\\$1) true charpath 
stroke'\h'\w'\\$1'u'
  ..
  .\" 
  Here is some
  .gcolor my-outline-color
  .ft BI
  .outline outlined\/
  .gcolor
  .ft
  text.

(Note that this code is not optimal, in particular because
grops does not set the font unless it is outputting something,
necessitating the hack of printing an explicit space with
\N'32' in order to get grops to set the desired font.)




outlined.pdf
Description: Adobe PDF document


Re: Proposed: make \X read its argument in copy mode

2024-01-17 Thread G. Branden Robinson
Hi John,

At 2024-01-18T00:32:04+1100, John Gardner wrote:
> So instead of:
> > \X'ps: \fB\s(12\m[red]big bold red text in my device command\fP'
> >
> > one would write:
> > \fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP
> 
> I believe you meant to provide an example more like this?
> 
> \X'ps: exec 1.0 0 0 setrgbcolor /Times-Bold findfont \n[.s] scalefont
> setfont (Text) show'

Well, not exactly, as I don't speak PostScript.  But you're offering a
good example of the sort of thing that my proposed change would _not_
affect: register interpolation escape sequences are interpolated in copy
mode, so assuming that the `.s` register contains an appropriate value,
then what you have should work fine, and in any case the same before and
after my proposed change.

A refresher on the definition of "copy mode" might be useful to the
discussion.

---snip groff info manual---
5.24.2 Copy Mode


GNU 'troff' processes certain requests in "copy mode": it interpolates
the escape sequences '\n', '\g', '\$', '\*', '\V', and '\?' normally;
interprets '\' immediately; discards comments '\"' and '\#';
interpolates the current leader, escape, or tab character with '\a',
'\e', and '\t', respectively; and represents all other escape sequences
in an encoded form.  The term "copy mode" reflects its most visible
application in requests that populate macros and strings, but other
requests also use it when interpreting arguments that can't meaningfully
represent typesetting operations.  For example, a font selection escape
sequence has no meaning in a hyphenation pattern file name ('hpf') or a
diagnostic message written to the terminal ('tm').

   The complement of copy mode--a 'roff' formatter's behavior when not
defining or appending to a macro, string, or diversion--where all macros
are interpolated, requests invoked, and valid escape sequences processed
immediately upon recognition, can be termed "interpretation mode".
--end snip---

And for those who don't have all the escape sequences memorized, here
are the ones that get interpreted even in copy mode.

---snip groff(7)---
\n[reg]
Interpolate contents of register with arbitrarily long
name reg.
\g[reg]
Interpolate format of register with arbitrarily long name
reg.
\$[nnn]
Interpolate macro or string parameter numbered nnn (nnn≥1).
\*[string arg ...]
Interpolate string with name string (of arbitrary length),
taking arg ... as arguments.
\V[env]
Interpolate contents of environment variable with
arbitrarily long name env.
\?anything\?
Transparently embed anything, read in copy mode, in a
diversion, or unformatted as an output comparand in a
conditional expression.  Ignored in the top‐level diversion.
---end snip---

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-17 Thread John Gardner
Hi Branden,

So instead of:
> \X'ps: \fB\s(12\m[red]big bold red text in my device command\fP'
>
> one would write:
> \fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP


I believe you meant to provide an example more like this?

\X'ps: exec 1.0 0 0 setrgbcolor /Times-Bold findfont \n[.s] scalefont
setfont (Text) show'

Regards,
— John


On Wed, 17 Jan 2024 at 06:23, G. Branden Robinson <
g.branden.robin...@gmail.com> wrote:

> Or: Should device control commands affect the environment?
>
> Recall the definition of the \X escape sequence from CSTR #54 (1992).
>
>   10.7.  Transparent output.  The sequence \X'anything' copies anything
>   to the output, as a device control function of the form x X anything
>   (§22).  Escape sequences in anything are processed.
>
> The foregoing doesn't say anything about copy mode.  This has a
> brow-raising consequence.
>
> Consider the following input.
>
> $ cat backslash-X-affects-environment.roff
> .sp
> .tm .f=\n(.f
> \X'\fB'
> .tm .f=\n(.f
>
> Documenter's Workbench 3.3 troff, Heirloom Doctools troff, and GNU troff
> all produce the same output to the standard error stream.
>
> .f=1
> .f=3
>
> What they produce as device-independent output might be even more
> interesting.
>
> $ DWBHOME=. ./bin/troff backslash-X-affects-environment.roff \
>   | grep '^x X'
> .f=1
> .f=3
> x X
>
> $ ./bin/troff backslash-X-affects-environment.roff | grep '^x X'
> .f=1
> .f=3
> x X LC_CTYPE en_US.UTF-8
> x X
>
> $ ~/groff-stable/bin/groff -Z ./backslash-X-affects-environment.roff \
>   | grep '^x X'
> .f=1
> .f=3
> x X
>
> Nothing.
>
> So "anything" doesn't exactly make it to the output as a device control,
> "transparently" or otherwise.  This is because a handful of escape
> sequences, like \f, \s, and the GNU extensions \m and \M (which set the
> stroke and fill colors, respectively) are never turned into nodes by the
> tokenization process; instead, (when not in copy mode) they immediately
> alter the current environment.
>
> Another thing to know is that there is no such thing as an environment
> in troff's output language.  Changes to the environment manifest as one
> or more other output commands, like 'f' for font selection, or 's' to
> set the type size.
>
> This foregoing exhibits seem like evidence of a design wart to me.  On
> no *roff do such escape sequences survive to device independent output;
> they can't, because they have no (direct) representation there.
>
> I therefore propose to change this, and have the `\X` escape sequence
> read its argument in copy mode.  That will make it work like the
> `device` request in groff 1.23.0 and earlier[1].
>
> Some things this wouldn't change:
>
> 1.  The ability to interpolate registers and strings inside device
> control commands--which I would guess is the main reason "Escape
> sequences in anything are processed" as CSTR #54 puts it--remains.
>
> 2.  The ability to affect the environment "simultaneously" with a device
> control command remains possible; just put those escape sequences
> _outside_ the device control escape sequence.
>
>So instead of:
>
>\X'ps: \fB\s(12\m[red]big bold red text in my device command\fP'
>
>one would write:
>
>\fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP
>
>...though I am dubious that the former has ever been well-defined
>as far as interactions with device operations go.
>
> Thoughts?  Objections?  Counterexamples?
>
> Regards,
> Branden
>
> [1] Earlier this week I pushed a change to make `device` read _its_
> argument in interpretation, not copy, mode.  My second thoughts
> about that are what prompted this proposal.
>
> See  for background.
>


Proposed: make \X read its argument in copy mode

2024-01-16 Thread G. Branden Robinson
Or: Should device control commands affect the environment?

Recall the definition of the \X escape sequence from CSTR #54 (1992).

  10.7.  Transparent output.  The sequence \X'anything' copies anything
  to the output, as a device control function of the form x X anything
  (§22).  Escape sequences in anything are processed.

The foregoing doesn't say anything about copy mode.  This has a
brow-raising consequence.

Consider the following input.

$ cat backslash-X-affects-environment.roff
.sp
.tm .f=\n(.f
\X'\fB'
.tm .f=\n(.f

Documenter's Workbench 3.3 troff, Heirloom Doctools troff, and GNU troff
all produce the same output to the standard error stream.

.f=1
.f=3

What they produce as device-independent output might be even more
interesting.

$ DWBHOME=. ./bin/troff backslash-X-affects-environment.roff \
  | grep '^x X'
.f=1
.f=3
x X

$ ./bin/troff backslash-X-affects-environment.roff | grep '^x X'
.f=1
.f=3
x X LC_CTYPE en_US.UTF-8
x X

$ ~/groff-stable/bin/groff -Z ./backslash-X-affects-environment.roff \
  | grep '^x X'
.f=1
.f=3
x X 

Nothing.

So "anything" doesn't exactly make it to the output as a device control,
"transparently" or otherwise.  This is because a handful of escape
sequences, like \f, \s, and the GNU extensions \m and \M (which set the
stroke and fill colors, respectively) are never turned into nodes by the
tokenization process; instead, (when not in copy mode) they immediately
alter the current environment.

Another thing to know is that there is no such thing as an environment
in troff's output language.  Changes to the environment manifest as one
or more other output commands, like 'f' for font selection, or 's' to
set the type size.

This foregoing exhibits seem like evidence of a design wart to me.  On
no *roff do such escape sequences survive to device independent output;
they can't, because they have no (direct) representation there.

I therefore propose to change this, and have the `\X` escape sequence
read its argument in copy mode.  That will make it work like the
`device` request in groff 1.23.0 and earlier[1].

Some things this wouldn't change:

1.  The ability to interpolate registers and strings inside device
control commands--which I would guess is the main reason "Escape
sequences in anything are processed" as CSTR #54 puts it--remains.

2.  The ability to affect the environment "simultaneously" with a device
control command remains possible; just put those escape sequences
_outside_ the device control escape sequence.

   So instead of:

   \X'ps: \fB\s(12\m[red]big bold red text in my device command\fP'

   one would write:

   \fB\s(12\m[red]\X'ps: big bold red text in my device command'\fP

   ...though I am dubious that the former has ever been well-defined
   as far as interactions with device operations go.

Thoughts?  Objections?  Counterexamples?

Regards,
Branden

[1] Earlier this week I pushed a change to make `device` read _its_
argument in interpretation, not copy, mode.  My second thoughts
about that are what prompted this proposal.

See  for background.


signature.asc
Description: PGP signature