[
https://issues.apache.org/jira/browse/PDFBOX-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler closed PDFBOX-1014.
--------------------------------------
Assignee: Andreas Lehmkühler
> Unused XRef object streams cause parser to fail + FIX
> -----------------------------------------------------
>
> Key: PDFBOX-1014
> URL: https://issues.apache.org/jira/browse/PDFBOX-1014
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Reporter: Timo Boehme
> Assignee: Andreas Lehmkühler
>
> I have a PDF document with 3 XRef streams (no xref table; PDF version 1.6).
> Currently PDFBOX reads and parses all 3 streams in the order the appear and
> combines the data in a dictionary (thus attributes specified in a later XRef
> stream overwrite attributes in earlier streams). The problem with my document
> is that the first 2 XRef streams declare document encryption while the last
> one does not. Furthermore the last one uses another document id thus trying
> to decrypt the document would fail because of the different IDs (however
> already the parsing of the stream in the first XRef object already fails.
> The solution I came up with is to first get all XRef streams, start looking
> from last one if it contains a 'Prev' key and go up the list as long as we
> have this 'Prev' key. This should work in most cases assuming that multiple
> active XRef sections appear in order without an unused XRef section in
> between. A really correct solution would have to test for object byte
> positions (therefore it would be necessary to store byte positions for each
> object).
> The fix in COSDocument.parseXrefStreams():
> public void parseXrefStreams() throws IOException
> {
> COSDictionary trailerDict = new COSDictionary();
>
> // use only last XRef and XRef which are referenced by a used XRef
> via 'Prev'
> // we assume that 'Prev' will reference next preceding xref object
> // (otherwise we would have to use object byte positions)
> List<COSObject> xrefStreams = getObjectsByType( "XRef" );
> int firstXRefIdx = xrefStreams.size() - 1;
> while ( firstXRefIdx > 0 ) {
> COSStream stream = (COSStream)xrefStreams.get( firstXRefIdx
> ).getObject();
> if ( stream.getInt( COSName.PREV, -1 ) == -1 )
> // no 'Prev' key; current xref object will be first one
> we use
> break;
> }
>
> // for( COSObject xrefStream : getObjectsByType( "XRef" ) )
> for ( int xrefIdx = firstXRefIdx, len = xrefStreams.size(); xrefIdx <
> len; xrefIdx++ )
> {
> COSStream stream = (COSStream)xrefStreams.get( xrefIdx
> ).getObject();
> trailerDict.addAll(stream);
> PDFXrefStreamParser parser =
> new PDFXrefStreamParser(stream, this, forceParsing);
> parser.parse();
> }
> setTrailer( trailerDict );
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira