William Palmer created PDFBOX-1769:
--------------------------------------

             Summary: Fix crash on invalid xref
                 Key: PDFBOX-1769
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1769
             Project: PDFBox
          Issue Type: Wish
          Components: Parsing
    Affects Versions: 1.8.2
            Reporter: William Palmer


Need to search for a correct xref start address

Example file:
http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf

Exception in thread "main" java.io.IOException: Error: Expected an integer 
type, actual='ref'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)

Using the code:
PDFTextStripper ts = new PDFTextStripper();
PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
RandomAccess scratchFile = new RandomAccessFile(File.createTempFile("pdfbox-", 
".tmp"), "rw");
PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
ts.setForceParsing(true);
ts.writeText(doc, out); 

Related: PDFBOX-1757



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to