On Wednesday 28 Mar 2012 16:02:01 Werner LEMBERG wrote: > > This is a (painful) limitation of Adobe's pdfmark specification: > > only a rather limited set of characters is permitted within the text > > which is specified to describe a bookmark. > > This is not correct, AFAIK. There are two encodings for pdfbookmarks, > namely PDFDocEncoding and Unicode. So it should certainly be possible > to use Czech characters, but apparently groff's pdfmark package > doesn't support Unicode bookmarks. > > Deri, what about gropdf? > > > Werner
Hi Werner, You are correct that full UTF-16 is supported for annotations, the problem is that by the time the string is passed to pdfbookmark the characters have been changed to named glyph nodes which I believe can't be converted back to their UTF-16 character code (i.e. \[u0159]) within a macro, so I'm in the same boat as Keith. In order to do this I think we'd need help from troff, something like .asciify16hex which would return the string as a BOM followed by the two byte unicode for each character, i.e. 00 41 01 59 (A rcarron) ... this could then be passed onto the pdf enclosed in '<>' with a BOM on the front instead of enclosing the text in '()'. Even being able to reconstitute \[u0159] would be helpful for gropdf, since it could then build the hex string itself. I've been looking into .asciify in a bit more detail (in preparation for the documention patch you asked for). Please can you confirm I've got this correct:- Node Action ==== ======================== line_start_node deleted space_node If was_escape_colon return ESCAPE_COLON else return node word_space_node return space(s) unbreakable_space_node return ESCAPE_TILDE diverted_space_node Ignored diverted_copy_file_node Ignored extra_size_node Ignored vertical_size_node deleted hmotion_node If was_tab return tab else return node space_char_hmotion_node return ESCAPE_SPACE vmotion_node Ignored hline_node Ignored vline_node Ignored zero_width_node Ignored left_italic_corrected_node deleted overstrike_node Ignored bracket_node Ignored draw_node Ignored glyph_node If asciify_code or ascii_code not 0 return chr() else return node. ligature_node deleted kern_pair_node deleted dbreak_node deleted italic_corrected_node deleted My c++ foo is not strong but I suspect the nodes marked as ignored (which have no specific asciify method) inherit the generic node method which is to return the node. It can be seen from the above that in several cases the asciified string/diversion will still hold nodes as well as ascii characters. Does this look correct Werner? As regards gropdf handling the czech example given, that seems to work perfectly with fonts which contain the needed characters, although I did fix a problem in this area quite recently so I owe you a patch for this. Cheers Deri