Re: [Groff] typesetting Czech with custom fonts

2012-03-29 Thread Werner LEMBERG

 You are correct that full UTF-16 is supported for annotations, the
 problem is that by the time the string is passed to pdfbookmark the
 characters have been changed to named glyph nodes which I believe
 can't be converted back to their UTF-16 character code
 (i.e. \[u0159]) within a macro, [...]

\X allows \[...] if `use_charnames_in_special' is set in the DESC
file.  This might help for gropdf which can then convert such entities
to proper PDF string literals.  BTW, `.device' doesn't has this
restriction, so

  .device \[foo]

gets happily emitted as

  x X \[foo]

even without `use_charnames_in_special'.

 In order to do this I think we'd need help from troff, something
 like .asciify16hex which would return the string as a BOM followed
 by the two byte unicode for each character, i.e. 00 41 01 59 (A
 rcaron)

You mean this hypothetical call

  .asciify16hex A\[u0159]

should return the string

  `00410159'

right?

 ... this could then be passed onto the pdf enclosed in '' with a
 BOM on the front instead of enclosing the text in '()'.

Why do you need a Byte Order Mark?  Note, however, that you actually
need UTF16-BE encoding for PDF literals, IIRC, so Unicode values
larger than U+ must be represented as surrogate pairs.

 Even being able to reconstitute \[u0159] would be helpful for
 gropdf, since it could then build the hex string itself.

What exactly do you mean with `reconstitute'?

 I've been looking into .asciify in a bit more detail (in preparation
 for the documention patch you asked for).  Please can you confirm
 I've got this correct: [...]

Looks fine.

 My c++ foo is not strong but I suspect the nodes marked as ignored
 (which have no specific asciify method) inherit the generic node
 method which is to return the node.

Correct.

 It can be seen from the above that in several cases the asciified
 string/diversion will still hold nodes as well as ascii characters.

Correct.


Werner



Re: [Groff] typesetting Czech with custom fonts

2012-03-29 Thread Petr Man
Dear Werner,

On Thu, Mar 29, 2012 at 06:53, Werner LEMBERG w...@gnu.org wrote:

  # generate.pe

  Open($1);
  Generate($fontname + .pfa); # this also generates the .afm file
  Generate($fontname + .t42);

 Call this with e.g.

  fontforge -script generate.pe GS_CE_.TTF


Fontforge worked like magic, I now have all the characters I need. Thanks a lot.

Petr



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Werner LEMBERG

 My question is whether this is caused by incorrect font conversion
 or if the problem lies somewhere else.

To help you, we need a minimal example which exposes the problem,
together with all the necessary stuff (including fonts).


Werner



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Werner LEMBERG

 This is a (painful) limitation of Adobe's pdfmark specification:
 only a rather limited set of characters is permitted within the text
 which is specified to describe a bookmark.

This is not correct, AFAIK.  There are two encodings for pdfbookmarks,
namely PDFDocEncoding and Unicode.  So it should certainly be possible
to use Czech characters, but apparently groff's pdfmark package
doesn't support Unicode bookmarks.

Deri, what about gropdf?


Werner



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Petr Man
On Wed, Mar 28, 2012 at 15:29, Deri James d...@chuzzlewit.demon.co.uk wrote:
 Are you talking about missing from the bookmark ouitline panel or missing from
 the text of the document?

 Missing from outline = each \X warning indicates a character was dropped. (For
 the reason Keith gave).
 Missing from document = probable font problem. (And we'd need a minimal
 example as requested by Werner).

 Deri

Hi Deri,

Keith clarified, that the errors came from pdfroff, I thought they
were from groff directly. At this moment I don't care about the
bookmarks really, all I want is to have the document display
correctly.

Example text should read:
Příliš žluťoučký kůň úpěl ďábelské ódy.

with iconv '-futf8' '-tlatin2', pipe to groff, doesn't complain:
Píli lu»ouký k úpl ábelské ódy.

with no conversion, complains with stdin:136: can't translate
character code 195 to special character `~A' in transparent
throughput:
Pli luouÄk k pÄl Äbelsk© dy.

When I use -k -Dutf8:
P liš žluouk k pl belsk dy.

This happens for me with the default fonts as well, not just the ones
that I have converted. I suspect groff doesn't know how to find the
glyphs.

Petr



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Werner LEMBERG

 Example text should read:
 Příliš žluťoučký kůň úpěl ďábelské ódy.
 
 with iconv '-futf8' '-tlatin2', pipe to groff, doesn't complain:
 Píli lu»ouký k úpl ábelské ódy.

But this is not correct usage.  groff internally uses latin1 encoding.
If you really want to use latin2, you must explicitly load the proper
macro package which maps latin2 encoding to encoding-independent
representation forms (\[..] constructs):

  cat cz \
  | iconv -f utf8 -t latin2 \
  | groff -mlatin2 -Tutf8

However, if you replace the `-Tutf8' backend with `-Tps', you get a
bunch of warnings because the standard PS fonts don't have all
necessary glyphs.

Instead of using an external iconv program or an old legacy encoding,
I recommend groff's `preconv' preprocessor (option `-k' or `-K enc')
which converts input in various encodings into groff's internal
character representation:

  cat cz \
  | groff -k -Tutf8

Much easier, much shorter.


Werner


Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Werner LEMBERG

 groff internally uses latin1 encoding.

Mhmm, bad wording.  latin1 is just the default setup for all backends
except -Tutf8.


Werner



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Petr Man
On Wed, Mar 28, 2012 at 18:27, Werner LEMBERG w...@gnu.org wrote:
 But this is not correct usage.  groff internally uses latin1 encoding.

 However, if you replace the `-Tutf8' backend with `-Tps', you get a
 bunch of warnings because the standard PS fonts don't have all
 necessary glyphs.

 Instead of using an external iconv program or an old legacy encoding,
 I recommend groff's `preconv' preprocessor (option `-k' or `-K enc')
 which converts input in various encodings into groff's internal
 character representation:

  cat cz \
  | groff -k -Tutf8

 Much easier, much shorter.
Werner,
that is very much what I would like to use. And I tried, I actually
use it always by default, but I still end up with missing characters.
You didn't have by any chance have a look at the file/fonts I mailed
you off-list? I suspect I simply didn't convert them correctly, but I
have no idea what could be wrong. As I previously wrote, I used the
method from mom's manual.
Petr



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Keith Marshall
On 28/03/12 16:09, Petr Man wrote:
 Keith clarified, 

I didn't...

 that the errors came from pdfroff, 

...because they don't.

 I thought they were from groff directly.

They are; specifically, when groff processes this...

  .nop \X'ps:exec [\\$* pdfmark'\c

...expression as it expands a .pdfmark macro invocation, in which $*
contains any groff special character, (such as the \(de I mentioned
earlier).  Now, it may be that \X can pass Unicode code point data, but
it is documented, (in groff's texinfo manual), that it will not handle
any groff escape, (other than a select few which are simply ignored), so
anything which needs an escape to express it, would seem to be excluded
from any pdfmark, such as is required to place a bookmark.

-- 
Regards,
Keith.



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Deri James
On Wednesday 28 Mar 2012 16:02:01 Werner LEMBERG wrote:
  This is a (painful) limitation of Adobe's pdfmark specification:
  only a rather limited set of characters is permitted within the text
  which is specified to describe a bookmark.
 
 This is not correct, AFAIK.  There are two encodings for pdfbookmarks,
 namely PDFDocEncoding and Unicode.  So it should certainly be possible
 to use Czech characters, but apparently groff's pdfmark package
 doesn't support Unicode bookmarks.
 
 Deri, what about gropdf?
 
 
 Werner

Hi Werner,

You are correct that full UTF-16 is supported for annotations, the problem is 
that by the time the string is passed to pdfbookmark the characters 
have been changed to named glyph nodes which I believe can't be converted back 
to their UTF-16 character code (i.e. \[u0159]) within a macro, so 
I'm in the same boat as Keith. In order to do this I think we'd need help from 
troff, something like .asciify16hex which would return the string as a 
BOM followed by the two byte unicode for each character, i.e. 00 41 01 59 (A 
rcarron) ... this could then be passed onto the pdf enclosed in '' 
with a BOM on the front instead of enclosing the text in '()'. Even being able 
to reconstitute \[u0159] would be helpful for gropdf, since it could then 
build the hex string itself.

I've been looking into .asciify in a bit more detail (in preparation for the 
documention patch you asked for). Please can you confirm I've got this 
correct:-

NodeAction


line_start_node deleted
space_node  If was_escape_colon return ESCAPE_COLON else 
return node
word_space_node return space(s)
unbreakable_space_node  return ESCAPE_TILDE
diverted_space_node Ignored
diverted_copy_file_node Ignored
extra_size_node Ignored
vertical_size_node  deleted
hmotion_nodeIf was_tab return tab else return node
space_char_hmotion_node return ESCAPE_SPACE
vmotion_nodeIgnored
hline_node  Ignored
vline_node  Ignored
zero_width_node Ignored
left_italic_corrected_node  deleted
overstrike_node Ignored
bracket_nodeIgnored
draw_node   Ignored
glyph_node  If asciify_code or ascii_code not 0 return 
chr() else return node.
ligature_node   deleted
kern_pair_node  deleted
dbreak_node deleted
italic_corrected_node   deleted

My c++ foo is not strong but I suspect the nodes marked as ignored (which have 
no specific asciify method) inherit the generic node method which 
is to return the node.

It can be seen from the above that in several cases the asciified 
string/diversion will still hold nodes as well as ascii characters.

Does this look correct Werner?

As regards gropdf handling the czech example given, that seems to work 
perfectly with fonts which contain the needed characters, although I did 
fix a problem in this area quite recently so I owe you a patch for this.

Cheers 

Deri



Re: [Groff] typesetting Czech with custom fonts

2012-03-28 Thread Peter Schaffter
On Thu, Mar 29, 2012, Werner Lemberg wrote:
  As I previously wrote, I used the method from mom's manual.
 
 Interesting.  I don't have time to verify the steps (and I don't know
 some of the involved programs), but you did it, and you failed.  So
 maybe the instructions should be revised.  Peter?

The mom instructions specify ttf2pt1 for the conversion from
TrueType to Type1.  I have on occasion had difficulty with it
myself; I notice it's been dropped from some of the major distros.
Werner is quite right to recommend fontforge.  And yes, the momdocs
need to be revised.

Deri James has been putting a lot of work into integrating the mom
macros with pdf.tmac and the gropdf device.  I'm waiting till that
project is complete, whence I'll be updating contrib/mom, including
the documentation.
 
-- 
Peter Schaffter

Author of The Binbrook Caucus
http://www.schaffter.ca