On Saturday, 20 January 2024 01:39:21 GMT G. Branden Robinson wrote:
> [self-follow-up with correction]
> 
> At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote:
> > This might be more accurately stated as:
> > 
> > 2) \X behaves like .device used to (in groff 1.23.0 and earlier).
> 
> [correction follows]
> And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode
> sequences in the output, but it seems useful so that authors of output
> drivers (and supporting macro files for them) can keep their sanity.
> 
> [elaboration]
> 
> What I mean is that we can pass Unicode between "pdf.tmac" and the
> output driver _today_.  Consider the following notional macro.
> 
> .de pdfmark2
> . nop \!x X ps:exec [\\$* pdfmark2
> ..
> 
> (The open bracket has something to do with PostScript syntax, I think.)
> 
> ...and it getting called by some other macro encoding the argument...
> 
> .de pdflink
> .  ds pdf*input \\$*\"
> .  encode pdf*input \" performs magic transformation, like "stringhex"
> .  pdfmark2 \\*[pdf*input]
> ..
> 
> ...and I have document using these.
> 
> .H 1 "This is my heading"
> .pdflink "HI DERI 😈"
> 
> This ultimately would show up in the output as something like this.
> 
> x X ps: exec [4849204445524920F09F9888 pdfmark2
> 
> Something pretty close to that works on the deri-gropdf-ng branch today,
> as I understand it.

Hi Branden,

I'm afraid this is all wrong (or at least out of date, my private branch, 
which is rebased against a very recent HEAD, does not use stringhex as part of 
the interface with gropdf, it only uses it to build register names which need 
to include unicode characters with in the name). In fact you know all this 
since you recently wrote:-

"Deri's right that his `stringhex` solution, and the underlying problem it
solves, aren't fundamentally about how the formatter talks to the device
driver (though that is ultimately a necessary step)", the bit in brackets is 
wrong.

As an example, if this was in a file.mom:-

.HEADING 1 "Гуляйпольщина или Махновщина"

After running through preconv the resultant grout is:-

x X ps:exec [/Dest /pdf:bm24 /Title (8. \[u0413]\[u0443]\[u043B]\[u044F]\
[u0439]\[u043F]\[u043E]\[u043B]\[u044C]\[u0449]\[u0438]\[u043D]\[u0430] \
[u0438]\[u043B]\[u0438] \[u041C]\[u0430]\[u0445]\[u043D]\[u043E]\[u0432]\
[u0449]\[u0438]\[u043D]\[u0430]) /Level 2 /OUT pdfmark

And the entry in the pdf looks like this:-

99 0 obj << /Dest /pdf:bm24
/Next 100 0 R
/Parent 77 0 R
/Prev 98 0 R
/Title 
(\376\377\0\70\0\56\0\40\4\23\4\103\4\73\4\117\4\71\4\77\4\76\4\73\4\114\4\111\4\70\4\75\4\60\0\40\4\70\4\73\4\70\0\40\4\34\4\60\4\105\4\75\4\76\4\62\4\111\4\70\4\75\4\60)
>>
endobj

The preconv unicodes have been converted to octal bytes with a UTF-16 BOM on 
the front, and a pdf viewer will show the string with unicode characters in 
its bookmark panel. No stringhex involved, just passing preconv output 
straight to gropdf.

> But my _suggestion_ would be that we support something more like this.
> 
> x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2
> 
> or this...
> 
> x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2
> 
> ...or even this...
> 
> x X ps: exec [HI DERI \[u1F608] pdfmark2
> 
> These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32,
> respectively.  The reuse of groff Unicode code point escape sequence
> syntax is, I would hope, more helpful than confusing.

This is exactly the technique I am now using. Whatever preconv produces, ends 
up as a UTF-16 string. You can mix normal text with the preconv output, (and 
groff characters like \[em]), but as soon as any character in the string 
requires unicode the whole string is converted.

Cheers

Deri
> My concerns are that (1) people don't have to use two different escaping
> conventions _within the formatter_ to get byte sequences to the output
> driver, and (2) that driver-supporting macro file writers don't have to
> handle a bunch of special cases in device control commands.
> 
> Those factors are what drive my proposal.
> 
> Regards,
> Branden






Reply via email to