mixing Cyrillic and Roman characters -> PDF output (repost)

Jay Berkenbilt Sat, 08 Nov 2008 08:41:39 -0800

For some reason, the body of my previous message did not make it to
the list.  Only the attachment made it.  I'm not sure why.  Here is
the body!  I've just included the attachment in the body.


----------------------------------------------------------------------

Using fop 0.95 with the PDF output format, if my input text that mixes
Roman and Cyrillic characters, what do I have to do to get fop to show
the proper Cyrillic characters in the PDF output (rather than just
'#')?  This works with the rtf, txt, and awt output methods but not
with PS or PDF.  I am aware of the need to create Unicode mappings in
PDF (having written software that generates PDF), but I don't see how
to tell fop to do this.  I understand that such a mapping is not
required for the other formats since the mapping is handled by the
viewer.

I apologize if this is a FAQ.  I've searched the list archives and
google, and I've seen many similar questions, but they seem to refer
to older versions of fop, and I haven't been able to see resolutions.
I've seen documentation about embedding fonts, but it seems to be
geared more toward adding typefaces than mapping Unicode characters.
I am using a Debian system.  I've tested this both with the debian fop
packages and by just downloading a binary distribution.  I've also
installed Type 1 Cyrillic fonts and run fop with the following
configuration file:

<fonts>
  <directory>/usr/share/fonts/X11/Type1</directory>
  <auto-detect/>
</fonts>

but this had no effect.  Perhaps I need to do more than that.

Looking at the source code, I can see that fop is explicitly
substituting '#' for any character that it doesn't know how to map,
but it seems to be hard-coding WinAnsiEncoding for the mapping.  As
far as I know, WinAnsiEncoding is a single-byte encoding and is not
going to have the Cyrillic characters in it.  I could be off
here....I've just looked lightly through the code.  I am certain that
the # characters are being generated by fop and not the result of some
kind of font substitution issue at viewing time.  Here is an excerpt
from the actual PDF content stream as generated by fop:

q
1 0 0 1 72 72 cm
BT
/F1 12 Tf
1 0 0 -1 0 10.266 Tm [(Russian) ( ) (spelling) ( ) (of) ( ) (Berkenblit:) ( ) 
(##########.) ] TJ
ET
Q

I must be missing something here.  Here are some specific questions:

 * Is what I'm doing supposed to work?  It seems like fop should be
   able to do the right thing with UTF-8 encoded text in multiple
   languages.  fop is just silently substituting # without even
   generating a warning, even if I run with the -d flag.

 * Do I have to set up a table somewhere that maps a range of
   characters to a font, or is fop supposed to do that automatically?
   Is there some configuration that I can use to tell it to use a
   different mapping that it might already know about?  Is just
   embedding the appropriate font sufficient?  I'm not aware of Type 1
   fonts containing Unicode mapping data.

I've attached a sample fo input file.  I am just running

fop a.fo a.pdf

to generate the output.  Thanks for any assistance!

----------------------------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?>

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";>

 <fo:layout-master-set>
  <fo:simple-page-master master-name="master" margin="1in">
   <fo:region-body region-name="main-body"/>
  </fo:simple-page-master>
 </fo:layout-master-set>

 <fo:page-sequence master-reference="master">
  <fo:flow flow-name="main-body">
   <fo:block>
    Russian spelling of Berkenblit: Беркенблит.
   </fo:block>
  </fo:flow>
 </fo:page-sequence>

</fo:root>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

mixing Cyrillic and Roman characters -> PDF output (repost)

Reply via email to