[ 
https://issues.apache.org/jira/browse/PDFBOX-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232727#comment-14232727
 ] 

Andreas Lehmkühler edited comment on PDFBOX-2532 at 12/3/14 8:11 AM:
---------------------------------------------------------------------

I looks like we are talking about different PDFs. In "my" pdf none of the type1 
fonts provide an encoding (within the pdf) and the text within the content 
stream is perfect readable. I've attached a screenshot


was (Author: lehmi):
I looks like we are talking about different PDFs. In "my" pdfs none of the 
type1 fonts provide an encoding (within the pdf) and the text within the 
content stream is perfect readable. I've attached a screenshot

> Text extraction fails due to the usage of the internal font mapping
> -------------------------------------------------------------------
>
>                 Key: PDFBOX-2532
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2532
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX2247-701542.pdf, PDFBOX2247-701542_cp_acrobat.txt, 
> PDFBOX2247-701542_sa_acrobat.txt, PDFBOX2247-701542_sa_acrobat_osx.txt, 
> PDFBOX2247-701542_sa_reader_osx.txt, PDFBOX2247-Debugger.png
>
>
> If a pdf doesn't provide any mapping (neither an encoding nor a toUnicode 
> mapping) we have to decide where to get a suitable mapping ourselves. We 
> can't use the internal font mapping of the type1C font as it doesn't work in 
> every case, see PDFBOX-2377 which provides a solution for the 1.8-branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to