[jira] Issue Comment Edited: (PDFBOX-450) PDFTextStripper CAN NOT extract correct font information for some early produced PDF documents

Shengwen (JIRA) Wed, 01 Apr 2009 20:36:38 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694878#action_12694878
 ]


Shengwen edited comment on PDFBOX-450 at 4/1/09 8:34 PM:
---------------------------------------------------------

It's obvious that the title, author, affiliation, and abstract in the first 
page of zzz.pdf have different font size, but the PDFTextStripper can not 
extract correct font information.

      was (Author: [email protected]):
    It's obvious that the title, author, affiliation, and abstract in the first 
page have different font size, but the PDFTextStripper can not extract correct 
font information.
  
> PDFTextStripper CAN NOT extract correct font information for some early 
> produced PDF documents
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-450
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-450
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>         Environment: Windows XP, JDK 5.0
>            Reporter: Shengwen
>             Fix For: 0.8.0-incubator
>
>         Attachments: 1066.pdf, zzz.pdf
>
>
> PDFTextStripper can not extract correct font information from the attached 
> document.  When I traced into the code, I found that the 
> TextPosition.getFontSize()=1 and TextPosition.getXScale() = 0.24 in all 
> cases. However, the TextPosition.getFont().getAverageFontWidth() has 
> different values obviously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (PDFBOX-450) PDFTextStripper CAN NOT extract correct font information for some early produced PDF documents

Reply via email to