[jira] [Resolved] (PDFBOX-773) expected='obj' actual='o' error while parsing the attached PDF

Timo Boehme (JIRA) Mon, 21 May 2012 15:29:44 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timo Boehme resolved PDFBOX-773.
--------------------------------

    Resolution: Won't Fix
      Assignee: Timo Boehme

Provided PDF document is broken.

Typically this kind of problem stems from sequentially parsing of PDFParser and 
can be resolved using NonSequentialPDFParser (option -nonSeq in some tools). 
However the provided document is broken (at least in the xref parts - multiple 
times an xref line is splitted by an extra \n (NonSequentialPDFParser will 
point you to the problematic offset).

Other readers will silently try to reconstruct the object references, which 
might result in content errors. PDFBOX does not have a special object structure 
reconstruction mode (only the standard PDFParser). Such a xref repair tool 
would be helpful to parse even broken documents with NonSequentialPDFParser. 
This however would be a feature request. At least the provided document would 
be a good test case for such a tool.
                
> expected='obj' actual='o' error while parsing the attached PDF
> --------------------------------------------------------------
>
>                 Key: PDFBOX-773
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-773
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.0, 1.3.1, 1.6.0
>         Environment: Sun JDK 6u21, Windows 7 x86
>            Reporter: Marin Nozhchev
>            Assignee: Timo Boehme
>         Attachments: Andersens_Fairy_Tales.zip, test_with_1.6.0_full.txt
>
>
> Parsing the attached PDF fails with the following error:
> Caused by: java.io.IOException: expected='obj' actual='o' 
> org.apache.pdfbox.io.PushBackInputStream@11d75b9
>       at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:509)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
>         ...
> The same errors appears with the 1.1, 1.2 releases and the 1.3 latest trunk 
> so far - svn rev. 962879 .
> The file opens without warnings or any visible issues in the latest versions 
> of Foxit Reader and Acrobat Reader on Windows. The parsing was done via the 
> Apache Tika Parser.
> Thank you

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-773) expected='obj' actual='o' error while parsing the attached PDF

Reply via email to