Re: Tibetan fonts

2024-01-20 Thread Oliver Corff via

Hi Tom,

བཀྲ་ཤིས་བདེ་ལེགས།

(sorry, I forgot the shad)

have you tried the font TibMachUni-1.901b.ttf? It is available via

package manager e.g. in Fedora 39.

Can post a source file for your examples, please? I may try in tomorrow.

Best regards,

Oliver.


On 21/01/2024 00:09, Tom wrote:

Hi,

I did typeset few books in Heirloom Troff with quite good outcome.
For next book I need to use Tibetan font and unfortunately I can't
make work in Heirloom Troff and Groff, only Neatroff does work.

Right now I'm eager to completely switch to Groff. I can make any font
work in Groff but not Tibetan. To my basic knowledge, I guess it is
all about blwm and blws not accessible. There are only few Tibetan
fonts with complex glyph composition which works in Neatroff. In groff
I have managed only BabelStoneTibetan to display but several glyphs
doesn't compose.

For viewing I have attached Groff and Neatroff pdfs.

YagpoTibetanUni: 100% composing in Neatroff but 100% failed in Groff
and Heirloom troff.
BabelStoneTibetan: 100% failed in Groff and Neatroff.

I have tried various tibetan fonts, as well NotoSerifTibetan present
in linux repositories. All those fonts don't work and groff yields:
troff::20: warning: special character 'u0F04' not defined
troff::20: warning: special character 'u0F05' not defined
troff::20: warning: special character 'u0F0D' not defined
...

If I understand enough, the warnings are about missing glyph mappings.

Would you mind to have a look and check if Tibetan fonts are possible
to make work in Groff ? I appreciate for any hints, and direction I
can follow to make it happen.


Regards,
Tom


--
Dr. Oliver Corff
Wittelsbacherstr. 5A
10707 Berlin
GERMANY
Tel.: +49-30-85727260
mailto:oliver.co...@email.de


Re: Tibetan fonts

2024-01-20 Thread Oliver Corff via

Hi Tom,

བཀྲ་ཤིས་བདེ་ལེགས

have you tried the font TibMachUni-1.901b.ttf? It is available via
package manager e.g. in Fedora 39.

Can post a source file for your examples, please? I may try in tomorrow.

Best regards,

Oliver.


On 21/01/2024 00:09, Tom wrote:

Hi,

I did typeset few books in Heirloom Troff with quite good outcome.
For next book I need to use Tibetan font and unfortunately I can't
make work in Heirloom Troff and Groff, only Neatroff does work.

Right now I'm eager to completely switch to Groff. I can make any font
work in Groff but not Tibetan. To my basic knowledge, I guess it is
all about blwm and blws not accessible. There are only few Tibetan
fonts with complex glyph composition which works in Neatroff. In groff
I have managed only BabelStoneTibetan to display but several glyphs
doesn't compose.

For viewing I have attached Groff and Neatroff pdfs.

YagpoTibetanUni: 100% composing in Neatroff but 100% failed in Groff
and Heirloom troff.
BabelStoneTibetan: 100% failed in Groff and Neatroff.

I have tried various tibetan fonts, as well NotoSerifTibetan present
in linux repositories. All those fonts don't work and groff yields:
troff::20: warning: special character 'u0F04' not defined
troff::20: warning: special character 'u0F05' not defined
troff::20: warning: special character 'u0F0D' not defined
...

If I understand enough, the warnings are about missing glyph mappings.

Would you mind to have a look and check if Tibetan fonts are possible
to make work in Groff ? I appreciate for any hints, and direction I
can follow to make it happen.


Regards,
Tom


--
Dr. Oliver Corff
Wittelsbacherstr. 5A
10707 Berlin
GERMANY
Tel.: +49-30-85727260
mailto:oliver.co...@email.de


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T21:03:15+, Deri wrote:
> On Saturday, 20 January 2024 01:39:21 GMT G. Branden Robinson wrote:
[snip]
> > x X ps: exec [4849204445524920F09F9888 pdfmark2
> > 
> > Something pretty close to that works on the deri-gropdf-ng branch
> > today, as I understand it.
> 
> I'm afraid this is all wrong (or at least out of date, my private
> branch, which is rebased against a very recent HEAD, does not use
> stringhex as part of the interface with gropdf,

Ahh.  A day without wrongness is like Mordor without orcs.  

> it only uses it to build register names which need to include unicode
> characters with in the name).

Yes.  I may have a minor issue with that from a robustness perspective
but it doesn't have anything to do with \X or device control commands;
it's purely a macro programming level matter.  When I get some round
tuits I'll raise it in a new thread or a Savannah ticket.  And I'll
try to check my facts first.  ;-)

> In fact you know all this since you recently wrote:-

Plenty of people know what they don't know, and plenty more don't know
what they don't know, but I would claim that it takes real talent to not
know what you DO know.

> As an example, if this was in a file.mom:-
> 
> .HEADING 1 "Гуляйпольщина или Махновщина"
> 
> After running through preconv the resultant grout is:-
> 
> x X ps:exec [/Dest /pdf:bm24 /Title (8. \[u0413]\[u0443]\[u043B]\[u044F]\
> [u0439]\[u043F]\[u043E]\[u043B]\[u044C]\[u0449]\[u0438]\[u043D]\[u0430] \
> [u0438]\[u043B]\[u0438] \[u041C]\[u0430]\[u0445]\[u043D]\[u043E]\[u0432]\
> [u0449]\[u0438]\[u043D]\[u0430]) /Level 2 /OUT pdfmark
> 
> And the entry in the pdf looks like this:-
> 
> 99 0 obj << /Dest /pdf:bm24
> /Next 100 0 R
> /Parent 77 0 R
> /Prev 98 0 R
> /Title 
> (\376\377\0\70\0\56\0\40\4\23\4\103\4\73\4\117\4\71\4\77\4\76\4\73\4\114\4\111\4\70\4\75\4\60\0\40\4\70\4\73\4\70\0\40\4\34\4\60\4\105\4\75\4\76\4\62\4\111\4\70\4\75\4\60)
> >>
> endobj
> 
> The preconv unicodes have been converted to octal bytes with a UTF-16
> BOM on the front,

As a terminology stickler, I would not call these "preconv unicodes",
and IMO UTF-16 should usually be spelled with the endianess included...
But, yes, I take your point.

> and a pdf viewer will show the string with unicode characters in its
> bookmark panel. No stringhex involved, just passing preconv output
> straight to gropdf.

Cool.  I perceive that something I want is a unit test for this,
possibly a minimal mom(7) document containing the foregoing heading and
as little else as possible.  So I'll work on that while the
\X-copy-mode item percolates on the discussion table a while longer.

(Who, me, mix metaphors?)

> This is exactly the technique I am now using. Whatever preconv
> produces, ends up as a UTF-16 string. You can mix normal text with the
> preconv output, (and groff characters like \[em]), but as soon as any
> character in the string requires unicode the whole string is
> converted.

This seems like a reasonable approach, to keep from having to manage
state.  ("Are we in ASCII mode or octal mode?")

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread G. Branden Robinson
Hi Deri,

At 2024-01-20T21:03:11+, Deri wrote:
> > can't transparently output node at top level
> > 
> > But the reason 1.23.0 doesn't throw these errors is because I hid
> > them, not because we fixed them.[7]
> 
> It might be worth clarifying what this caused this error to appear
> (before you suppressed it in 1.23.0).

Certainly.  I think we're so far down in the weeds relative to a daily
*roff user's experience that concrete examples are especially helpful.

> A particularly "fruity" bookmark appears in the mom example file
> mom-pdf.mom. It uses:-
> 
> .HEADING 1 \
> "Comparison of \-Tps\*[FU4]/\*[FU2]\-mpdfmark with \-Tpdf\*[FU4]/\*[FU2]\-mom
> 
> Which after expansion becomes this:-
> 
> 7. Comparison of \-Tps\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-Tpdf\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mom
> 
> And this passed to .pdfbookmark!

Mmmm, fragrant!  Hint of citrus with an overpowering bouquet of durian!

> In the version of pdf.tmac used until now, this monstrous string is
> run through .asciify to produce:-
> 
> 7. Comparison of Tps/mpdfmark with Tpdf/mom
> 
> You can see that all the "\-" are missing, .asciify left them as
> nodes, and each of them would elicit the error.

Yes.  Not the most helpful behavior.  Some day I'd like to kill
`asciify`, or move it into a "string.tmac" file along with `length`,
`stringup`, `stringdown`, and so forth.  My notional string iterator
(`for`) should make that straightforward.

> So under 1.22.4 this is what the overview bookmark in the pdf looked
> like:-
> 
> 96 0 obj
> <<
> /Dest /pdf:bm23
> /Parent 93 0 R
> /Title (7. Comparison of Tps/mpdfmark with Tpdf/mom)
> /Prev 109 0 R 
> >>
> endobj
> 
> Obviously, using .asciify is not the answer, particularly since each
> unicode character (\[u]) is a node which can't be asciified, so
> gets dropped.

Right.  `asciify` promises things it can't deliver, unless you're
already a major expert and manage your expectations.

> So in the latest version of pdf.tmac, not incorporated by Branden yet,
> the use of asciify has been dropped and the complete, raw, string, is
> passed to the output driver, so it becomes gropdf's job to make sense
> of the bookmark. The grout output looks like:-
> 
> x X ps:exec [/Dest /pdf:bm23 /Title (7. Comparison of \-Tps\h'(\En[.ps]u/
> \E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-
> Tpdf\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-
> mom) /Level 2 /OUT pdfmark

Pretty wild stuff.  I wonder what we can do to drop all the stuff that
the device control command won't have any use for _before_ packing it in
there.  Might need a macro that iterates through a string and drops all
nodes from it.  That's going to need 2 of the new features I have in
mind.

> But when gropdf writes the pdf it contains:-
> 
> 96 0 oj << /Dest /pdf:bm23
> /Parent 75 0 R 
> /Prev 91 0 R 
> /Title (7. Comparison of -Tps/-mpdfmark with -Tpdf/-mom)
> >>
> endobj

It appears that you're doing a lot of cleanup work in gropdf that I'd
prefer you didn't have to do.

> Which you can see is a more accurate rendition of what the bookmark
> should be. 

Very much so.

> The new pdf.tmac with the now released gropdf successfully handles all
> unicode (\[u]), groff named glyphs (i.e. \[em] or \(em), and even
> \N'233' type, when they are passed to the output driver.  This means
> that passing unicode in device controls is not an issue at all, no
> need to invent a new way, just using the well established convention
> of using \[u] for the unicode characters, which preconv provides.

I didn't realize that.  This is great news!  I didn't want to _make_ you
handle the \[u] convention, not realizing you had already done the
work to support it.

It sounds like we're reading from the same hymnal on this issue.

The important difference with respect to the Subject: line is that, _if_
someone was using \X to construct these device control commands, then
where formerly they would be saying something like (please excuse my
pidgin PDF)

\X'ps: exec [/Dest /pdf:bm23 /Author Ephraim Bar-B\\[u0065_0301]cue'

...to pass through the Unicode composite character, if the community
ratifies (or silently assents to) my proposal to make \X read its
argument in copy mode, the extra escape character will stop being
necessary, just as it would not be necessary in a string or macro
definition (which are also read in copy mode).

\X'ps: exec [/Dest /pdf:bm23 /Author Ephraim Bar-B\[u0065_0301]cue'

I reckon the affected audience here is small, and possibly restricted to
participants in this thread.  But I of course will NEWS item it.

Still, I'll give people a bit longer to comment and opine.  If I get
antsy I can always push to a branch.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread Deri
On Saturday, 20 January 2024 00:56:34 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2024-01-20T00:07:21+, Deri wrote:
> > On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> > > Right.  Before I craft a lengthy response to this--did you see the
> > > footnote?
> > 
> > Yes, sorry, it didn't help. I'm just comparing output now with output
> > in 1.23.0 and what you claim you are doing is the reverse of what I'm
> > seeing.
> 
> I haven't yet pushed anything implementing my (new) intentions,
> reflected in the subject line.  I wanted to gather feedback first.
> 
> What happened was, I thought "the `device` request and `\X` escape
> sequence should behave the same, modulo the usual differences in parsing
> (delimitation vs. reading the rest of the line, the leading double quote
> mechanism in request form, and so forth)".
> 
> Historically, that has never been the case in groff.
> 
> Here's (the meat of) the actual test case I recently wrote and pushed.
> 
> input='.nf
> \X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00]
> -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]# .device bogus1: req
> \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti] .ec @
> @X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00]
> -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]# .device bogus2: req
> @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'
> 
> I know that looks hairy as hell.  I'm testing several things.
> 
> Here is what the output of that test looks like on groff 1.22.3 and
> 1.22.4.
> 
> x X bogus1: esc man-beast\[u1F00] -
> x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
> x X bogus2: esc man-beast@[u1F00] -
> x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]
> 
> Observations of the above:
> 
> A.  When using `\X`, the escape sequences \%, \[u1F63c], \[aq], \[dq],
> \[ga], \[ha], \[rs], \[ti] all get discarded.
> 
> B.  When you change the escape character and self-quote it in the
> formatter, it comes out as-is in the device control command.  I
> found this absurd, since there is no such thing as an escape
> character in the device-independent output language, and whatever
> escaping convention a device-specific control command needs to come
> up with for things like, oh, expressing Unicode code points is
> necessarily independent of a random *roff document's choice of
> escape character anyway.
> 
> Here is what the test output looks like on groff 1.23.0.  It enabled a
> few more characters to get rendered in PDF bookmarks.
> 
> x X bogus1: esc man-beast\[u1F00] -'"`^\~
> x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
> x X bogus2: esc man-beast@[u1F00] -'"`^\~
> x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]
> 
> Here is what the test output looks like on groff Git HEAD.  It was my
> first stab at solving the problem, the one I am now having partial
> second thoughts about.
> 
> x X bogus1: esc man-beast\[u1F00] -'"`^\~
> x X bogus1: req man-beast\[u1F00] -'"`^\~
> x X bogus2: esc man-beast\[u1F00] -'"`^\~
> x X bogus2: req man-beast\[u1F00] -'"`^\~
> 
> I was briefly happy with this, but I started wondering what happens when
> you interpolate any crazy old damned string inside a device control
> command and I rapidly became uncomfortable.  Because `\X` does not read
> its argument in copy mode, it can get exposed to "nodes" (and in groff
> Git, `device` can too)--this is that old incomprehensible nemesis that
> afflicted pdfmom users relentlessly before 1.23.0.[1][2][3][4][5][6]
> 
>   can't transparently output node at top level
> 
> But the reason 1.23.0 doesn't throw these errors is because I hid them,
> not because we fixed them.[7]

Hi Branden,

It might be worth clarifying what this caused this error to appear (before you 
suppressed it in 1.23.0). A particularly "fruity" bookmark appears in the mom 
example file mom-pdf.mom. It uses:-

.HEADING 1 \
"Comparison of \-Tps\*[FU4]/\*[FU2]\-mpdfmark with \-Tpdf\*[FU4]/\*[FU2]\-mom

Which after expansion becomes this:-

7. Comparison of \-Tps\h'(\En[.ps]u/\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/
\E*[$KERN_UNIT]u*2u)'\-mpdfmark with \-Tpdf\h'(\En[.ps]u/
\E*[$KERN_UNIT]u*4u)'/\h'(\En[.ps]u/\E*[$KERN_UNIT]u*2u)'\-mom

And this passed to .pdfbookmark! In the version of pdf.tmac used until now, 
this monstrous string is run through .asciify to produce:-

7. Comparison of Tps/mpdfmark with Tpdf/mom

You can see that all the "\-" are missing, .asciify left them as nodes, and 
each of them would elicit the error. So under 1.22.4 this is what the overview 
bookmark in the pdf looked like:-

96 0 obj
<<
/Dest /pdf:bm23
/Parent 93 0 R
/Title (7. Comparison of Tps/mpdfmark with Tpdf/mom)
/Prev 109 0 R 
>>
endobj

Obviously, using .asciify is not the answer, particularly since each unicode 
character (\[u]) is a node which can't be asciified, so gets dropped. So 
in the latest version of pdf.tmac, not incorporated by Branden 

Re: Proposed: make \X read its argument in copy mode

2024-01-20 Thread Deri
On Saturday, 20 January 2024 01:39:21 GMT G. Branden Robinson wrote:
> [self-follow-up with correction]
> 
> At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote:
> > This might be more accurately stated as:
> > 
> > 2) \X behaves like .device used to (in groff 1.23.0 and earlier).
> 
> [correction follows]
> And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode
> sequences in the output, but it seems useful so that authors of output
> drivers (and supporting macro files for them) can keep their sanity.
> 
> [elaboration]
> 
> What I mean is that we can pass Unicode between "pdf.tmac" and the
> output driver _today_.  Consider the following notional macro.
> 
> .de pdfmark2
> . nop \!x X ps:exec [\\$* pdfmark2
> ..
> 
> (The open bracket has something to do with PostScript syntax, I think.)
> 
> ...and it getting called by some other macro encoding the argument...
> 
> .de pdflink
> .  ds pdf*input \\$*\"
> .  encode pdf*input \" performs magic transformation, like "stringhex"
> .  pdfmark2 \\*[pdf*input]
> ..
> 
> ...and I have document using these.
> 
> .H 1 "This is my heading"
> .pdflink "HI DERI "
> 
> This ultimately would show up in the output as something like this.
> 
> x X ps: exec [4849204445524920F09F9888 pdfmark2
> 
> Something pretty close to that works on the deri-gropdf-ng branch today,
> as I understand it.

Hi Branden,

I'm afraid this is all wrong (or at least out of date, my private branch, 
which is rebased against a very recent HEAD, does not use stringhex as part of 
the interface with gropdf, it only uses it to build register names which need 
to include unicode characters with in the name). In fact you know all this 
since you recently wrote:-

"Deri's right that his `stringhex` solution, and the underlying problem it
solves, aren't fundamentally about how the formatter talks to the device
driver (though that is ultimately a necessary step)", the bit in brackets is 
wrong.

As an example, if this was in a file.mom:-

.HEADING 1 "Гуляйпольщина или Махновщина"

After running through preconv the resultant grout is:-

x X ps:exec [/Dest /pdf:bm24 /Title (8. \[u0413]\[u0443]\[u043B]\[u044F]\
[u0439]\[u043F]\[u043E]\[u043B]\[u044C]\[u0449]\[u0438]\[u043D]\[u0430] \
[u0438]\[u043B]\[u0438] \[u041C]\[u0430]\[u0445]\[u043D]\[u043E]\[u0432]\
[u0449]\[u0438]\[u043D]\[u0430]) /Level 2 /OUT pdfmark

And the entry in the pdf looks like this:-

99 0 obj << /Dest /pdf:bm24
/Next 100 0 R
/Parent 77 0 R
/Prev 98 0 R
/Title 
(\376\377\0\70\0\56\0\40\4\23\4\103\4\73\4\117\4\71\4\77\4\76\4\73\4\114\4\111\4\70\4\75\4\60\0\40\4\70\4\73\4\70\0\40\4\34\4\60\4\105\4\75\4\76\4\62\4\111\4\70\4\75\4\60)
>>
endobj

The preconv unicodes have been converted to octal bytes with a UTF-16 BOM on 
the front, and a pdf viewer will show the string with unicode characters in 
its bookmark panel. No stringhex involved, just passing preconv output 
straight to gropdf.

> But my _suggestion_ would be that we support something more like this.
> 
> x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2
> 
> or this...
> 
> x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2
> 
> ...or even this...
> 
> x X ps: exec [HI DERI \[u1F608] pdfmark2
> 
> These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32,
> respectively.  The reuse of groff Unicode code point escape sequence
> syntax is, I would hope, more helpful than confusing.

This is exactly the technique I am now using. Whatever preconv produces, ends 
up as a UTF-16 string. You can mix normal text with the preconv output, (and 
groff characters like \[em]), but as soon as any character in the string 
requires unicode the whole string is converted.

Cheers

Deri
> My concerns are that (1) people don't have to use two different escaping
> conventions _within the formatter_ to get byte sequences to the output
> driver, and (2) that driver-supporting macro file writers don't have to
> handle a bunch of special cases in device control commands.
> 
> Those factors are what drive my proposal.
> 
> Regards,
> Branden