Re: [Groff] typesetting Czech with custom fonts
You are correct that full UTF-16 is supported for annotations, the problem is that by the time the string is passed to pdfbookmark the characters have been changed to named glyph nodes which I believe can't be converted back to their UTF-16 character code (i.e. \[u0159]) within a macro, [...] \X allows \[...] if `use_charnames_in_special' is set in the DESC file. This might help for gropdf which can then convert such entities to proper PDF string literals. BTW, `.device' doesn't has this restriction, so .device \[foo] gets happily emitted as x X \[foo] even without `use_charnames_in_special'. In order to do this I think we'd need help from troff, something like .asciify16hex which would return the string as a BOM followed by the two byte unicode for each character, i.e. 00 41 01 59 (A rcaron) You mean this hypothetical call .asciify16hex A\[u0159] should return the string `00410159' right? ... this could then be passed onto the pdf enclosed in '' with a BOM on the front instead of enclosing the text in '()'. Why do you need a Byte Order Mark? Note, however, that you actually need UTF16-BE encoding for PDF literals, IIRC, so Unicode values larger than U+ must be represented as surrogate pairs. Even being able to reconstitute \[u0159] would be helpful for gropdf, since it could then build the hex string itself. What exactly do you mean with `reconstitute'? I've been looking into .asciify in a bit more detail (in preparation for the documention patch you asked for). Please can you confirm I've got this correct: [...] Looks fine. My c++ foo is not strong but I suspect the nodes marked as ignored (which have no specific asciify method) inherit the generic node method which is to return the node. Correct. It can be seen from the above that in several cases the asciified string/diversion will still hold nodes as well as ascii characters. Correct. Werner
Re: [Groff] typesetting Czech with custom fonts
Dear Werner, On Thu, Mar 29, 2012 at 06:53, Werner LEMBERG w...@gnu.org wrote: # generate.pe Open($1); Generate($fontname + .pfa); # this also generates the .afm file Generate($fontname + .t42); Call this with e.g. fontforge -script generate.pe GS_CE_.TTF Fontforge worked like magic, I now have all the characters I need. Thanks a lot. Petr
Re: [Groff] typesetting Czech with custom fonts
My question is whether this is caused by incorrect font conversion or if the problem lies somewhere else. To help you, we need a minimal example which exposes the problem, together with all the necessary stuff (including fonts). Werner
Re: [Groff] typesetting Czech with custom fonts
This is a (painful) limitation of Adobe's pdfmark specification: only a rather limited set of characters is permitted within the text which is specified to describe a bookmark. This is not correct, AFAIK. There are two encodings for pdfbookmarks, namely PDFDocEncoding and Unicode. So it should certainly be possible to use Czech characters, but apparently groff's pdfmark package doesn't support Unicode bookmarks. Deri, what about gropdf? Werner
Re: [Groff] typesetting Czech with custom fonts
On Wed, Mar 28, 2012 at 15:29, Deri James d...@chuzzlewit.demon.co.uk wrote: Are you talking about missing from the bookmark ouitline panel or missing from the text of the document? Missing from outline = each \X warning indicates a character was dropped. (For the reason Keith gave). Missing from document = probable font problem. (And we'd need a minimal example as requested by Werner). Deri Hi Deri, Keith clarified, that the errors came from pdfroff, I thought they were from groff directly. At this moment I don't care about the bookmarks really, all I want is to have the document display correctly. Example text should read: Příliš žluťoučký kůň úpěl ďábelské ódy. with iconv '-futf8' '-tlatin2', pipe to groff, doesn't complain: Píli lu»ouký k úpl ábelské ódy. with no conversion, complains with stdin:136: can't translate character code 195 to special character `~A' in transparent throughput: Pli luouÄk k pÄl Äbelsk© dy. When I use -k -Dutf8: P liš žluouk k pl belsk dy. This happens for me with the default fonts as well, not just the ones that I have converted. I suspect groff doesn't know how to find the glyphs. Petr
Re: [Groff] typesetting Czech with custom fonts
Example text should read: Příliš žluťoučký kůň úpěl ďábelské ódy. with iconv '-futf8' '-tlatin2', pipe to groff, doesn't complain: Píli lu»ouký k úpl ábelské ódy. But this is not correct usage. groff internally uses latin1 encoding. If you really want to use latin2, you must explicitly load the proper macro package which maps latin2 encoding to encoding-independent representation forms (\[..] constructs): cat cz \ | iconv -f utf8 -t latin2 \ | groff -mlatin2 -Tutf8 However, if you replace the `-Tutf8' backend with `-Tps', you get a bunch of warnings because the standard PS fonts don't have all necessary glyphs. Instead of using an external iconv program or an old legacy encoding, I recommend groff's `preconv' preprocessor (option `-k' or `-K enc') which converts input in various encodings into groff's internal character representation: cat cz \ | groff -k -Tutf8 Much easier, much shorter. Werner
Re: [Groff] typesetting Czech with custom fonts
groff internally uses latin1 encoding. Mhmm, bad wording. latin1 is just the default setup for all backends except -Tutf8. Werner
Re: [Groff] typesetting Czech with custom fonts
On Wed, Mar 28, 2012 at 18:27, Werner LEMBERG w...@gnu.org wrote: But this is not correct usage. groff internally uses latin1 encoding. However, if you replace the `-Tutf8' backend with `-Tps', you get a bunch of warnings because the standard PS fonts don't have all necessary glyphs. Instead of using an external iconv program or an old legacy encoding, I recommend groff's `preconv' preprocessor (option `-k' or `-K enc') which converts input in various encodings into groff's internal character representation: cat cz \ | groff -k -Tutf8 Much easier, much shorter. Werner, that is very much what I would like to use. And I tried, I actually use it always by default, but I still end up with missing characters. You didn't have by any chance have a look at the file/fonts I mailed you off-list? I suspect I simply didn't convert them correctly, but I have no idea what could be wrong. As I previously wrote, I used the method from mom's manual. Petr
Re: [Groff] typesetting Czech with custom fonts
On 28/03/12 16:09, Petr Man wrote: Keith clarified, I didn't... that the errors came from pdfroff, ...because they don't. I thought they were from groff directly. They are; specifically, when groff processes this... .nop \X'ps:exec [\\$* pdfmark'\c ...expression as it expands a .pdfmark macro invocation, in which $* contains any groff special character, (such as the \(de I mentioned earlier). Now, it may be that \X can pass Unicode code point data, but it is documented, (in groff's texinfo manual), that it will not handle any groff escape, (other than a select few which are simply ignored), so anything which needs an escape to express it, would seem to be excluded from any pdfmark, such as is required to place a bookmark. -- Regards, Keith.
Re: [Groff] typesetting Czech with custom fonts
On Wednesday 28 Mar 2012 16:02:01 Werner LEMBERG wrote: This is a (painful) limitation of Adobe's pdfmark specification: only a rather limited set of characters is permitted within the text which is specified to describe a bookmark. This is not correct, AFAIK. There are two encodings for pdfbookmarks, namely PDFDocEncoding and Unicode. So it should certainly be possible to use Czech characters, but apparently groff's pdfmark package doesn't support Unicode bookmarks. Deri, what about gropdf? Werner Hi Werner, You are correct that full UTF-16 is supported for annotations, the problem is that by the time the string is passed to pdfbookmark the characters have been changed to named glyph nodes which I believe can't be converted back to their UTF-16 character code (i.e. \[u0159]) within a macro, so I'm in the same boat as Keith. In order to do this I think we'd need help from troff, something like .asciify16hex which would return the string as a BOM followed by the two byte unicode for each character, i.e. 00 41 01 59 (A rcarron) ... this could then be passed onto the pdf enclosed in '' with a BOM on the front instead of enclosing the text in '()'. Even being able to reconstitute \[u0159] would be helpful for gropdf, since it could then build the hex string itself. I've been looking into .asciify in a bit more detail (in preparation for the documention patch you asked for). Please can you confirm I've got this correct:- NodeAction line_start_node deleted space_node If was_escape_colon return ESCAPE_COLON else return node word_space_node return space(s) unbreakable_space_node return ESCAPE_TILDE diverted_space_node Ignored diverted_copy_file_node Ignored extra_size_node Ignored vertical_size_node deleted hmotion_nodeIf was_tab return tab else return node space_char_hmotion_node return ESCAPE_SPACE vmotion_nodeIgnored hline_node Ignored vline_node Ignored zero_width_node Ignored left_italic_corrected_node deleted overstrike_node Ignored bracket_nodeIgnored draw_node Ignored glyph_node If asciify_code or ascii_code not 0 return chr() else return node. ligature_node deleted kern_pair_node deleted dbreak_node deleted italic_corrected_node deleted My c++ foo is not strong but I suspect the nodes marked as ignored (which have no specific asciify method) inherit the generic node method which is to return the node. It can be seen from the above that in several cases the asciified string/diversion will still hold nodes as well as ascii characters. Does this look correct Werner? As regards gropdf handling the czech example given, that seems to work perfectly with fonts which contain the needed characters, although I did fix a problem in this area quite recently so I owe you a patch for this. Cheers Deri
Re: [Groff] typesetting Czech with custom fonts
On Thu, Mar 29, 2012, Werner Lemberg wrote: As I previously wrote, I used the method from mom's manual. Interesting. I don't have time to verify the steps (and I don't know some of the involved programs), but you did it, and you failed. So maybe the instructions should be revised. Peter? The mom instructions specify ttf2pt1 for the conversion from TrueType to Type1. I have on occasion had difficulty with it myself; I notice it's been dropped from some of the major distros. Werner is quite right to recommend fontforge. And yes, the momdocs need to be revised. Deri James has been putting a lot of work into integrating the mom macros with pdf.tmac and the gropdf device. I'm waiting till that project is complete, whence I'll be updating contrib/mom, including the documentation. -- Peter Schaffter Author of The Binbrook Caucus http://www.schaffter.ca