[ 
https://issues.apache.org/jira/browse/PDFBOX-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849224#comment-13849224
 ] 

Andreas Lehmkühler edited comment on PDFBOX-1769 at 12/16/13 3:17 PM:
----------------------------------------------------------------------

I've added a fix in revision 1551220. 

- the parser doesn't complain about negative offsets any more, as those 
represent object numbers (type2 entries in a xref object stream)
- the parser now unreads the right amount of bytes if the keyword is splitted 
into 2 halfs



was (Author: lehmi):
I've added a fix in revision 1551220. 

- the parser doesn't complain about negative offsets any more, as those 
represent object numbers (type2 entries in a xref 
object stream)
- the parser now unreads the right amount of bytes if the keyword is splitted 
into to halfs


> Fix crash on invalid xref
> -------------------------
>
>                 Key: PDFBOX-1769
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1769
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing
>    Affects Versions: 1.8.2
>            Reporter: William Palmer
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.4, 2.0.0
>
>
> Need to search for a correct xref start address
> Example file:
> http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf
> Exception in thread "main" java.io.IOException: Error: Expected an integer 
> type, actual='ref'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
> Using the code:
> PDFTextStripper ts = new PDFTextStripper();
> PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
> RandomAccess scratchFile = new 
> RandomAccessFile(File.createTempFile("pdfbox-", ".tmp"), "rw");
> PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
> ts.setForceParsing(true);
> ts.writeText(doc, out); 
> Related: PDFBOX-1757



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to