Follow-up Comment #26, bug #66653 (group groff): [comment #21 comment #21:] > I have also included a screen shot of the pdf produced. You can see the > missing characters in the document and the same diversion used in the > bookmark has the base character only.
Yeah, something funny is going on there. I'll return to that below.
> The most interesting thing to note is that the original string register
> \*[khant] retains the original input value when used in a bookmark (I was
> super-pleased when you got that working).
Thanks!
> If you want to experiment with a font you have installed, try:-
> .ft U-TR
> .sp 1i0
> .ds khant "time to meet the Şhaka Khan
> At the top of the file and include "-Kutf8" on the command line.
It also works to just use the corresponding Unicode escape sequence directly.
Either way it gets decomposed.
The thing that baffles me is that, after decomposition, the base and combining
characters get reversed in order when they go to the output. This doesn't
happen with (some) other complex characters. It _does_ happen with U+0102.
It's a consequence of post-1.23.0 changes, though.
$ cat ATTIC/66653-7-undivert-weird-character.groff
.box DIV
\[u015E]
.br
.box
.\"pm DIV
.DIV
.asciify DIV
.\"tm *** UNDIVERTED NOW ***
.\"pm DIV
.DIV
$ ~/groff-1.23.0/bin/groff -a ATTIC/66653-7-undivert-weird-character.groff
<beginning of page>
<u0053_0327> <u0053_0327>
Here's the behavior on HEAD (Savannah as of , not my working copy).
$ ~/groff-HEAD/bin/groff -a ATTIC/66653-7-undivert-weird-character.groff
<beginning of page>
<ac>S <ac>S
Possibly this has to do with node list reversal. If we restore the commented
requests (of which the interesting ones, 1.23.0 doesn't support), we see the
following.
$ ~/groff-HEAD/bin/groff -a ATTIC/66653-7-undivert-weird-character.groff
{"name": "DIV", "file name": "ATTIC\/66653-7-undivert-weird-character.groff",
"starting line number": 1, "length": 5, "contents":
"\u0000\u0000\u0000\u0000\n", "node list": [{"type": "line_start_node",
"diversion level": 0, "is_special_node": false}, {"type": "composite_node",
"diversion level": 0, "is_special_node": false, "special character":
"u0053_0327", "contents": [{"type": "line_start_node", "diversion level": 0,
"is_special_node": false},
{"type": "zero_width_node", "diversion level": 0, "is_special_node": false,
"contains": [{"type": "glyph_node", "diversion level": 0, "is_special_node":
false, "special character": "ac"}]},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "S"}]
}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node":
false, "vunits": -12000}, {"type": "vertical_size_node", "diversion level": 0,
"is_special_node": false, "vunits": 0}]}
<beginning of page>
*** UNDIVERTED NOW ***
{"name": "DIV", "file name": "ATTIC\/66653-7-undivert-weird-character.groff",
"starting line number": 7, "length": 2, "contents": "\u0000\n", "node list":
[{"type": "composite_node", "diversion level": 0, "is_special_node": false,
"special character": "u0053_0327", "contents": [{"type": "line_start_node",
"diversion level": 0, "is_special_node": false},
{"type": "zero_width_node", "diversion level": 0, "is_special_node": false,
"contains": [{"type": "glyph_node", "diversion level": 0, "is_special_node":
false, "special character": "ac"}]},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "S"}]
}]}
<ac>S <ac>S
Feeding the JSONic stuff to _jq_(1), we get:
{
"name": "DIV",
"file name": "ATTIC/66653-7-undivert-weird-character.groff",
"starting line number": 1,
"length": 5,
"contents": "\u0000\u0000\u0000\u0000\n",
"node list": [
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "composite_node",
"diversion level": 0,
"is_special_node": false,
"special character": "u0053_0327",
"contents": [
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "zero_width_node",
"diversion level": 0,
"is_special_node": false,
"contains": [
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"special character": "ac"
}
]
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "S"
}
]
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": -12000
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": 0
}
]
}
{
"name": "DIV",
"file name": "ATTIC/66653-7-undivert-weird-character.groff",
"starting line number": 7,
"length": 2,
"contents": "\u0000\n",
"node list": [
{
"type": "composite_node",
"diversion level": 0,
"is_special_node": false,
"special character": "u0053_0327",
"contents": [
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "zero_width_node",
"diversion level": 0,
"is_special_node": false,
"contains": [
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"special character": "ac"
}
]
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "S"
}
]
}
]
}
> This remaining problem concerning composite characters is not a show
> stopper,
I'd still like to defeat it. But I may need to let it go for this week and
push a bunch of other changes I have prepared, including the ones that
deliberately discard dummy nodes ("transparent" and otherwise).
> it only raises its ugly head when we add pdf features to ms (me?) in 1.25.
If I can get all the within-formatter obstacles cleared out of the way, then
we could enable widespread experimentation with replacements of "s.tmac", or
supplements to it, that everyone with a stock _groff_ 1.24 can play with
easily.
That'd be nice to have. But it will depend on this decomposed character issue
falling to a second attempt in the near term.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?66653>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
