[
https://issues.apache.org/jira/browse/PDFBOX-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694878#action_12694878
]
Shengwen edited comment on PDFBOX-450 at 4/1/09 8:34 PM:
---------------------------------------------------------
It's obvious that the title, author, affiliation, and abstract in the first
page of zzz.pdf have different font size, but the PDFTextStripper can not
extract correct font information.
was (Author: [email protected]):
It's obvious that the title, author, affiliation, and abstract in the first
page have different font size, but the PDFTextStripper can not extract correct
font information.
> PDFTextStripper CAN NOT extract correct font information for some early
> produced PDF documents
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-450
> URL: https://issues.apache.org/jira/browse/PDFBOX-450
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Environment: Windows XP, JDK 5.0
> Reporter: Shengwen
> Fix For: 0.8.0-incubator
>
> Attachments: 1066.pdf, zzz.pdf
>
>
> PDFTextStripper can not extract correct font information from the attached
> document. When I traced into the code, I found that the
> TextPosition.getFontSize()=1 and TextPosition.getXScale() = 0.24 in all
> cases. However, the TextPosition.getFont().getAverageFontWidth() has
> different values obviously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.