Re: Questions about toUnicode Cmap

Andreas Lehmkuehler Tue, 13 Mar 2012 12:32:11 -0700

Hi,

Am 13.03.2012 19:10, schrieb Andreas Lehmkuehler:

Hi


Am 09.03.2012 07:30, schrieb Andreas Lehmkuehler:

Hi,

Am 08.03.2012 09:52, schrieb Leleu Eric:

Hi,

2012/3/8 Andreas Lehmkuehler<[email protected]>

Hi,

Am 07.03.2012 09:15, schrieb Leleu Eric:

Hi all,

<SNIP>

I don't need to render the Text in the preflight component, I only check
that the glyph is present and I check the consistency of the width.

Bypass the AWT-Font will be great but it is a huge work.

Yes, but we need to do that, because some of the needed fonts aren't supported
or the support is buggy, see PDFBOX-490.

What is your point of view about these two points?

Probably we can find a workaround for your issue, but I need some more
details on how the preflight code works (see above).

I had a look and I guess there is no workaround.

I don't know the origin purpose of PDFont#encode but nowadays it tries to
provide a readable version of the encoded text. AFAIK it's used in 3 different
cases:

- text extraction: works fine as long as PDFBox knows how to encode the text
- rendering: the rendering uses java.awt.Font#drawString and therefore it also
needs the readable text. BUT this doesn't work in many cases (CID fonts,
substituted fonts etc.). In the long run we have to use the cid too to support
every kind of font
- preflight: ContentStreamWrapper#validText expects to get the CID when calling
PDFont#encode but that only works if cid == string

To make it more complicated, the encoding cmap is overwritten if a ToUnicode
cmap is used at the same time.

TODO:

- separate the ToUnicode cmap from the encoding cmap

I guess that's done [1]

- split PDFont#encode, to get one methode providing the string and one providing
the cid.


BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-1252

Re: Questions about toUnicode Cmap

Reply via email to