[ 
https://issues.apache.org/jira/browse/PDFBOX-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694879#action_12694879
 ] 

Shengwen edited comment on PDFBOX-450 at 4/1/09 8:35 PM:
---------------------------------------------------------

By the way, PDFTextStripper can not handle the non-ascii character in the 
author line of the attached document (1066.pdf). It produced a space in the 
position of the non-ascii character. (i.e. "Anu Pramila, Anja Keskinarkaus, and 
Tapio Sepp anen")

Thanks very much.

      was (Author: [email protected]):
    By the way, PDFTextStripper can not handle the non-ascii character in the 
author line of the attached document. It produced a space in the position of 
the non-ascii character. (i.e. "Anu Pramila, Anja Keskinarkaus, and Tapio Sepp 
anen")

Thanks very much.
  
> PDFTextStripper CAN NOT extract correct font information for some early 
> produced PDF documents
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-450
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-450
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>         Environment: Windows XP, JDK 5.0
>            Reporter: Shengwen
>             Fix For: 0.8.0-incubator
>
>         Attachments: 1066.pdf, zzz.pdf
>
>
> PDFTextStripper can not extract correct font information from the attached 
> document.  When I traced into the code, I found that the 
> TextPosition.getFontSize()=1 and TextPosition.getXScale() = 0.24 in all 
> cases. However, the TextPosition.getFont().getAverageFontWidth() has 
> different values obviously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to