FOP character mapping problems

Mike Ferrando 24 Apr 2003 22:23:26 -0000

Friends,
I am converting documents from XML using XSL into FO files then
using FOP to convert these into PDF.


The problem I am having is that some of my NCR are not transforming
(Numeric Character Reference). My FO file is correct. The
Character NCR is retained "&# 299;" (space added so that you can see
what the NCR is without the space it could appear correctly or
incorrectly) in the FO file. 
Unicode "012B" in Latin Extended A (0100--017F)
http://www.eki.ee/letter/chardata.cgi?ucode=012B

I succefully created metric xml files for my userconfig.xml file.
Using only the font-family "Arial" (arial.xml), all characters are
displayed correctly when the -c userconfig.xml is given as an option
in the command line.

  fop -c conf\userconfig.xml -fo file.fo -pdf file.pdf

When the document is open in Acrobat 5, I try to search words that
appear in the Arial font. I get no results. Nothing is found by the
Acrobat search tool. However, if I transform all text in the Base
font (Times), and only the one character (&# 299;) in the "Arial"
font, I can find the whole word up to that character.

I have even looked through the cid-fonts.fo file as well. Basically
all characters that are not in the Base 14 font sets are
unsearchable. The Apache site even confirms this being the result of
using metric fonts for these characters. (quote)

When embedding TrueType fonts, a new font, containing only the glyphs
used, is created from the original font and embedded in the pdf.
Currently, this embedded font contains only the minimum data needed
to be embedded in a pdf document, and does not contain any codepage
information. The PDF document contains indexes to the glyphs in the
font instead of to encoded characters. While the document will be
displayed correctly, the net effect of this is that searching,
indexing, and cut-and-paste will not work properly.
http://xml.apache.org/fop/fonts.html#embedding

I thought that this paragraph was directed to ttf files only. So I
tried the ttc files. But the results were still the same. Any
characters that are not part of the Base 14 fonts (transformed from
embedded fonts) will not be able to be searched in the pdf output
file.

I know my XML XSL transformation is no problem, since I regularly
transform XML into HTML or SGML. So, I am trying to fix the problem
that occurs between the FO file and the PDF output file.

I am using Windows 2000.

Any suggestions?
Sincerely,
Mike Ferrando
Washington, DC


XML:
<element>Solov&# 299;</element> (space added)

--------
S & R '&#' with '& amp;#_'
--------

FO:
<fo:inline>Solov& amp;#_299;</fo:inline> (space added)

--------
S & R '& amp;#_' with '&#'
--------

PDF:
Solov#


__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FOP character mapping problems

Reply via email to