[
https://issues.apache.org/jira/browse/PDFBOX-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shengwen updated PDFBOX-450:
----------------------------
Attachment: zzz.pdf
It's obvious that the title, author, affiliation, and abstract in the first
page have different font size, but the PDFTextStripper can not extract correct
font information.
> PDFTextStripper CAN NOT extract correct font information for some early
> produced PDF documents
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-450
> URL: https://issues.apache.org/jira/browse/PDFBOX-450
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Environment: Windows XP, JDK 5.0
> Reporter: Shengwen
> Fix For: 0.8.0-incubator
>
> Attachments: zzz.pdf
>
>
> PDFTextStripper can not extract correct font information from the attached
> document. When I traced into the code, I found that the
> TextPosition.getFontSize()=1 and TextPosition.getXScale() = 0.24 in all
> cases. However, the TextPosition.getFont().getAverageFontWidth() has
> different values obviously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.