> You are correct that full UTF-16 is supported for annotations, the > problem is that by the time the string is passed to pdfbookmark the > characters have been changed to named glyph nodes which I believe > can't be converted back to their UTF-16 character code > (i.e. \[u0159]) within a macro, [...]
\X allows \[...] if `use_charnames_in_special' is set in the DESC file. This might help for gropdf which can then convert such entities to proper PDF string literals. BTW, `.device' doesn't has this restriction, so .device \[foo] gets happily emitted as x X \[foo] even without `use_charnames_in_special'. > In order to do this I think we'd need help from troff, something > like .asciify16hex which would return the string as a BOM followed > by the two byte unicode for each character, i.e. 00 41 01 59 (A > rcaron) You mean this hypothetical call .asciify16hex A\[u0159] should return the string `00410159' right? > ... this could then be passed onto the pdf enclosed in '<>' with a > BOM on the front instead of enclosing the text in '()'. Why do you need a Byte Order Mark? Note, however, that you actually need UTF16-BE encoding for PDF literals, IIRC, so Unicode values larger than U+FFFF must be represented as surrogate pairs. > Even being able to reconstitute \[u0159] would be helpful for > gropdf, since it could then build the hex string itself. What exactly do you mean with `reconstitute'? > I've been looking into .asciify in a bit more detail (in preparation > for the documention patch you asked for). Please can you confirm > I've got this correct: [...] Looks fine. > My c++ foo is not strong but I suspect the nodes marked as ignored > (which have no specific asciify method) inherit the generic node > method which is to return the node. Correct. > It can be seen from the above that in several cases the asciified > string/diversion will still hold nodes as well as ascii characters. Correct. Werner