[ https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-1241. ----------------------------------- Resolution: Fixed Fix Version/s: 1.8.3 I tested with old versions, it failed until before 1.8.3. Since 1.8.3. has already been released, I assume I should close it and not just set it do resolved. > Better handle of missing offset at the end of a file > ---------------------------------------------------- > > Key: PDFBOX-1241 > URL: https://issues.apache.org/jira/browse/PDFBOX-1241 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Text extraction > Affects Versions: 1.6.0 > Environment: All platforms affected > Reporter: Ernst Eibensteiner > Fix For: 1.8.3 > > Attachments: On the Insert tab.pdf > > > We came across PDF files that do not have an offset at the end of the file. > This leads to the following exeption: > c:\tmp> java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 "On the Insert > tab.pdf" > ExtractText failed with the following exception: > java.io.IOException: Error: Expected an integer type, actual='' > at > org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384) > at > org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6 > 63) > at > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978) > at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:76) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:42) > While these PDFs are non-conforming, it'd be an improvement to allow them to > be read and processed. -- This message was sent by Atlassian JIRA (v6.2#6252)