Cheng Zhong created PDFBOX-4097:
-----------------------------------

             Summary: Compressed object will lost when brute force search 
failed to handle compressed streams
                 Key: PDFBOX-4097
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4097
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 2.0.8
            Reporter: Cheng Zhong
         Attachments: 奥美医疗-IPO.pdf

Compressed object described in cross-reference streams will lost when brute 
force search failed to handle such streams.

The attached PDF has an object 1336, but it had a offset that referenced to 
object 1828. The inconsistency led to a brute force search. (Introduced by 
*COSParser.checkXrefOffsets*)

During the search (in *bfSearchForObjStreams*), Object stream 1828, 1829, 1830 
failed to decompress due to "corrupted" stream(yes, the *Params* field was 
missing in the dictionary or the *Filter* was wrong). Thus, 462 compressed 
objects described in cross-reference streams are lost. Since important objects 
(the Root, the Pages, etc.) referred to objects in 1828 or something, all 
resolved to null (because the corrected XRefOffsets doens't have them). Further 
parsing is impossible.

However, when I tried to bypass *checkXrefOffsets*, the PDF shows correctly 
without any (noticeable) error. It seemed that object 1336 is not used in the 
PDF.

"Corrupted" 1828:
{code:java}
1828 0 obj
<<
/Length 2176
/Type /ObjStm
/N 200
/First 2103
/Filter /FlatDecode
>>
...{code}
It doesn't work well in *bfSearchForObjStreams* but works in 
*parseObjectStream*.

 

Would it be nice to have a fallback to preserve compressed stream object key 
offsets, when we some error in brute force search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to