William Palmer created PDFBOX-1769:
--------------------------------------
Summary: Fix crash on invalid xref
Key: PDFBOX-1769
URL: https://issues.apache.org/jira/browse/PDFBOX-1769
Project: PDFBox
Issue Type: Wish
Components: Parsing
Affects Versions: 1.8.2
Reporter: William Palmer
Need to search for a correct xref start address
Example file:
http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf
Exception in thread "main" java.io.IOException: Error: Expected an integer
type, actual='ref'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
Using the code:
PDFTextStripper ts = new PDFTextStripper();
PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
RandomAccess scratchFile = new RandomAccessFile(File.createTempFile("pdfbox-",
".tmp"), "rw");
PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
ts.setForceParsing(true);
ts.writeText(doc, out);
Related: PDFBOX-1757
--
This message was sent by Atlassian JIRA
(v6.1#6144)