[
https://issues.apache.org/jira/browse/PDFBOX-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856351#comment-15856351
]
Tilman Hausherr edited comment on PDFBOX-3677 at 2/7/17 5:17 PM:
-----------------------------------------------------------------
[~guebeli] please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.5-SNAPSHOT/
The change will avoid the NPE but the file will still bring trouble. You could
extract the (type 1) font with the PDFDebugger command line application. Just
go to the resources, the font, then click on each font until it fails. Then
search that part of the tree for "FontDescriptor" and then "FontFile", there's
the font file. Right-click to save. I suspect that the file is too short.
Please give feedback what happens now.
was (Author: tilman):
[~guebeli] please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.5-SNAPSHOT/
The change will avoid the NPE but the file will still bring trouble. You could
extract the (type 1) font with the PDFDebugger command line application. Just
go to the resources, the font, then click on each font until it fails. Then
search that part of the tree for "FontDescriptor" and then "FontFile", there's
the font file. Right-click to save. I suspect that the file is too short.
> NullPointerException in Type1Parser.read
> ----------------------------------------
>
> Key: PDFBOX-3677
> URL: https://issues.apache.org/jira/browse/PDFBOX-3677
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 2.0.3, 2.0.4
> Environment: Windows 10, java version "1.8.0_25"
> Reporter: Manuel Gübeli
> Fix For: 2.0.5
>
> Attachments: StackTrace.txt
>
>
> Text extraction from certain PDFs is not possible and PDF Box responses with
> NullPointerException. Text extraction from same PDF with version 1.8.13 is
> working.
> Originally the issue was discovered while using the newest Apache Tika 1.14
> library. I can not down-grade to PDF Box 1.8.13 with Apache Tika 1.14.
> Unfortunately I can not provide the PDFs that fail to you. However, I did
> some testing and found out that “Token token = lexer.nextToken();” return
> Null.
> Feb 07, 2017 12:17:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
> SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
> java.io.IOException: Found token=null but expected NAME
> Caused by: java.io.EOFException
> at
> org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:302)
> at
> org.apache.pdfbox.pdfparser.COSParser.checkXRefOffset(COSParser.java:1177)
> at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:202)
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]