[jira] [Resolved] (PDFBOX-383) BaseParser incorrectly handling stream, exhibiting IOException

JIRA Fri, 18 May 2012 09:59:30 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler resolved PDFBOX-383.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

The attached pdfs works fine using the new non sequential parser, see PDFBOX 
for details.
                
> BaseParser incorrectly handling stream, exhibiting IOException
> --------------------------------------------------------------
>
>                 Key: PDFBOX-383
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-383
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73 with java 5 running on windows platform
>            Reporter: Son
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: BaseParser.java, fail.pdf
>
>
> when loading pdf file containing a file attachment annotation , errors might 
> occurs when 2 conditions arise:
> - the Length value for the dictionary of F stream holds an indirect reference 
> to a integer value
> - the content of the filtered stream contains the word 'endstream'
> typically this occurs when, in the pdf file, there is a stream description as 
> follows:
> 12 0 obj
> << /Length 16 0 R
> /Filter /FlateDecode
> >>
> stream
> {content}
> endstream
> endobj
> ...
> 16 0 obj
> {length}
> endobj
> ....
> and it the {content} (filtered) contains the (filtered) string "endstream".
> (see on line 3700 of the attachment)
> the problem is related to the way stream content is (always) read by method 
> readUntilEndStream () that stop on first 'endstream' sequence end.
> a (partial) fix was made, that reads the stream content 3 different ways:
> - if the Length is known (this is a direct object), the {length} bytes are 
> read and written to the stream FilteredStream
> - if the Length is unknown and if the filter is FlateFilter, the code 
> unfilters the datas (the FlateDecode algorythm allows for not knowing the 
> length of encoded data ahead of time) and associates to the stream's 
> unfiltered stream
> - otherwise, let current behavior
> Running the modified code on files exhibiting errors has fixed problems that 
> was encountered. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-383) BaseParser incorrectly handling stream, exhibiting IOException

Reply via email to