David KELLER created PDFBOX-1845:
------------------------------------

             Summary: PDDocument.load() give Error: Expected a long type at 
offset 1633
                 Key: PDFBOX-1845
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1845
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.8.0, 2.0.0
         Environment: Windows 8.1
            Reporter: David KELLER


I run this simple program with the file in attachment (scanned OCR document 
from Nuance Omnipage 18)

        public static void main(String[] args)
        throws Exception {
                System.out.println("Start SplitFileTest...");
                String path = 
"D:\\test\\batch\\scan_manual\\courrier\\david.keller\\";
                String pdfFile = path + "14 01 2014.pdf";
                
                FileInputStream pdfInputStream = new FileInputStream(pdfFile);
                
                PDDocument pdDocument = PDDocument.load(pdfInputStream);
                List<PDPage> pages = 
pdDocument.getDocumentCatalog().getAllPages();

                
                pdfInputStream.close();
        }


And with the 1.8.0 version I have this error :

java.io.IOException: Error: Expected an integer type, actual='12977[373'
        at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
        at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:604)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)



And I have just builded the 2.0.0 from the last code source and I have this 
error :

 java.io.IOException: Error: Expected a long type at offset 1633
        at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1682)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
        at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:663)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1101)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to