[ https://issues.apache.org/jira/browse/PDFBOX-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282952#comment-14282952 ]
John Hewson commented on PDFBOX-2607: ------------------------------------- This file is not a valid PDF, because instead of embedding the relevant sections from the PDB file, the entire PFB file (headers and all) has been embedded. Acrobat handles this, so we will too, by detecting the header and unpacking the PFB before processing. > Failed reading embedded Font > ---------------------------- > > Key: PDFBOX-2607 > URL: https://issues.apache.org/jira/browse/PDFBOX-2607 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 2.0.0 > Reporter: Holger Floerke > Assignee: John Hewson > Attachments: 0023-4834_t1_1.pdf > > > Hi, > I try to extract an image out of the attatched pdf. PDFViewer like "Acrobat > Reader" or the Ubuntu "Document Viewer" are able to display the PDF in a > correct way. pdfbox is throwing exception: > {code} > SCHWERWIEGEND: Can't read the embedded Type1 font GLCNUS+StempelGaramond-Roman > java.io.IOException: Invalid start of ASCII segment > at org.apache.fontbox.type1.Type1Parser.parseASCII(Type1Parser.java:83) > at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:61) > at > org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:70) > at > org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:174) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:65) > at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:92) > at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:803) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:465) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:439) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) > at > org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.run(ExtractImages.java:195) > at org.apache.pdfbox.tools.ExtractImages.extract(ExtractImages.java:174) > at org.apache.pdfbox.tools.ExtractImages.run(ExtractImages.java:139) > at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:83) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59) > {code} > Checked with the latest version from git. > {code} > java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractImages > /home/hf/Downloads/0023-4834_t1_1.pdf > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)