[
https://issues.apache.org/jira/browse/PDFBOX-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475617#comment-13475617
]
Andreas Lehmkühler edited comment on PDFBOX-1129 at 10/13/12 2:13 PM:
----------------------------------------------------------------------
I'm afraid the pdf doesn't contain any information on how to map those glyphs.
Even the adobe reader isn't able to map those.
Set resolution to "not a problem"
was (Author: lehmi):
I'm afraid the pdf doesn't contain any information how to map those glyphs.
Even the adobe rerader isn't able to map those.
Set resolution to "not a problem"
> Quote glyphs (quoteright, quotedblright, etc.) not mapped to the right
> Unicode character
> ----------------------------------------------------------------------------------------
>
> Key: PDFBOX-1129
> URL: https://issues.apache.org/jira/browse/PDFBOX-1129
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.7.0
> Reporter: Michael McCandless
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Attachments: 000086.pdf
>
>
> I have an example PDF (will attach) that uses a right-single-quote
> character, but extracts incorrectly from PDFBox (using ExtractText).
> If I copy/paste, the text is correct (I get U+2019 for the right
> quote).
> Search for "cashier" in the PDF, on page 1 to see it; that right quote
> is supposed to come through as U+2019 I think.
> I looked at the PDF in PDFDebugger, and I see this fragment in the
> "Contents" for page 1:
> (Bring the voucher handout to the cashier\325s office \(10-180\))Tj
> So somehow this \325 escape fails to map to the quoteright glyph. The
> font is partial embedded font BPOLKO+TimesNewRomanPSMT, and I can see
> in the Charset (under FontDescriptor, for font F1) that it references
> this glyph.
> I also see a [correct] entry in glyphlist.txt, mapping to U+2019, so
> that's not the problem.
> Not sure what's going wrong... maybe somehow \325 fails to map to
> quoteright?
> There are other glyphs (quotedblright, quotedblleft) that are also not
> converted correctly, eg search for project review on page 2.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira