[jira] [Closed] (PDFBOX-1038) Strange signs after pdftohtml parsing.

JIRA Thu, 23 Oct 2014 11:28:01 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler closed PDFBOX-1038.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0
         Assignee: Andreas Lehmkühler

Works fine at least starting with 1.6.0 except a small part of the text which 
can't be extracted due to a missing mapping. Acrobat provides a similar result

> Strange signs after pdftohtml parsing.
> --------------------------------------
>
>                 Key: PDFBOX-1038
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1038
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.5.0
>         Environment: windows vista
>            Reporter: Funfel
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.6.0
>
>         Attachments: pg0007.html, pg0007.pdf
>
>
> After parsing pdf to html I've got a strange signs which supposed to be nice 
> letter (not chinese or japanese). I've noticed that font description for them 
> is UniversPro-Roman-Identity-H. 
> How can get it generated properly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (PDFBOX-1038) Strange signs after pdftohtml parsing.

Reply via email to