Re: [Groff] typesetting Czech with custom fonts

Deri James Wed, 28 Mar 2012 18:16:24 -0700

On Wednesday 28 Mar 2012 16:02:01 Werner LEMBERG wrote:
> > This is a (painful) limitation of Adobe's pdfmark specification:
> > only a rather limited set of characters is permitted within the text
> > which is specified to describe a bookmark.
> 
> This is not correct, AFAIK.  There are two encodings for pdfbookmarks,
> namely PDFDocEncoding and Unicode.  So it should certainly be possible
> to use Czech characters, but apparently groff's pdfmark package
> doesn't support Unicode bookmarks.
> 
> Deri, what about gropdf?
> 
> 
>     Werner


Hi Werner,

You are correct that full UTF-16 is supported for annotations, the problem is 
that by the time the string is passed to pdfbookmark the characters 
have been changed to named glyph nodes which I believe can't be converted back 
to their UTF-16 character code (i.e. \[u0159]) within a macro, so 
I'm in the same boat as Keith. In order to do this I think we'd need help from 
troff, something like .asciify16hex which would return the string as a 
BOM followed by the two byte unicode for each character, i.e. 00 41 01 59 (A 
rcarron) ... this could then be passed onto the pdf enclosed in '<>' 
with a BOM on the front instead of enclosing the text in '()'. Even being able 
to reconstitute \[u0159] would be helpful for gropdf, since it could then 
build the hex string itself.

I've been looking into .asciify in a bit more detail (in preparation for the 
documention patch you asked for). Please can you confirm I've got this 
correct:-

Node                                    Action
====                            ========================

line_start_node                 deleted
space_node                      If was_escape_colon return ESCAPE_COLON else 
return node
word_space_node         return space(s)
unbreakable_space_node  return ESCAPE_TILDE
diverted_space_node             Ignored
diverted_copy_file_node Ignored
extra_size_node                 Ignored
vertical_size_node              deleted
hmotion_node                    If was_tab return tab else return node
space_char_hmotion_node return ESCAPE_SPACE
vmotion_node                    Ignored
hline_node                              Ignored
vline_node                              Ignored
zero_width_node                 Ignored
left_italic_corrected_node      deleted
overstrike_node                 Ignored
bracket_node                    Ignored
draw_node                               Ignored
glyph_node                      If asciify_code or ascii_code not 0 return 
chr() else return node.
ligature_node                   deleted
kern_pair_node                  deleted
dbreak_node                     deleted
italic_corrected_node           deleted

My c++ foo is not strong but I suspect the nodes marked as ignored (which have 
no specific asciify method) inherit the generic node method which 
is to return the node.

It can be seen from the above that in several cases the asciified 
string/diversion will still hold nodes as well as ascii characters.

Does this look correct Werner?

As regards gropdf handling the czech example given, that seems to work 
perfectly with fonts which contain the needed characters, although I did 
fix a problem in this area quite recently so I owe you a patch for this.

Cheers 

Deri

Re: [Groff] typesetting Czech with custom fonts

Reply via email to