[ 
https://issues.apache.org/jira/browse/PDFBOX-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengwen updated PDFBOX-450:
----------------------------

    Attachment: zzz.pdf

It's obvious that the title, author, affiliation, and abstract in the first 
page have different font size, but the PDFTextStripper can not extract correct 
font information.

> PDFTextStripper CAN NOT extract correct font information for some early 
> produced PDF documents
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-450
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-450
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>         Environment: Windows XP, JDK 5.0
>            Reporter: Shengwen
>             Fix For: 0.8.0-incubator
>
>         Attachments: zzz.pdf
>
>
> PDFTextStripper can not extract correct font information from the attached 
> document.  When I traced into the code, I found that the 
> TextPosition.getFontSize()=1 and TextPosition.getXScale() = 0.24 in all 
> cases. However, the TextPosition.getFont().getAverageFontWidth() has 
> different values obviously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to