Done
https://issues.apache.org/jira/browse/PDFBOX-1232

Dave Smith
Candata Ltd.
416-493-9020x2413
Direct: 416-855-2413



On Tue, Feb 21, 2012 at 2:51 AM, Timo Boehme <timo.boe...@ontochem.com> wrote:
> Could you please open a JIRA issue for this problem?
>
>
> Thanks,
>
> Timo
>
>
> Am 21.02.2012 02:44, schrieb Dave Smith:
>>
>> I am having some problems with certain pdf docs and the FlateDecoder.
>> So for example
>>
>> A chunk like this ...
>> 3 0 obj
>> <<
>> /Type /XObject
>> /Subtype /Form
>> /FormType 1
>> /Resources<<  /Font 4 0 R
>> /ProcSet [/PDF /ImageC /Text]>>
>> /BBox [0 0 595 842]
>> /Matrix [1 0 0 1 0 0]
>> /Filter /FlateDecode
>> /Length 5>>
>> stream
>> H<89>^C^@
>> endstream
>> endobj
>>
>> The blob is 72, -119, 3, 0, 13 decimal.
>>
>> Now if I run it through the jcraft zlib decoder it works (it is an
>> empty string but that is beside the point) in latest trunk it throws
>> an end of data exception.  The problem is that the decode chunk ends
>> without a terminating bit in the stream and thus the EOF. According to
>> the deflate spec it is not required so I would consider this a bug on
>> the Java InflateInputStream.
>>
>>
>> I recoded the decoder and it seems to work in all my testcases where I
>> had zlib streams with and without the Z_STREAM_END set. The code is
>> below ...
>>
>>
>>
>>  protected ByteArrayOutputStream decompress(InputStream in)
>>        throws IOException, DataFormatException
>>    {
>>        ByteArrayOutputStream out = new ByteArrayOutputStream();
>>        byte buf[] = new byte[1000];
>>        Inflater inflater = new Inflater();
>>        int read = in.read(buf);
>>        if(read == 0)
>>        {
>>                return out;
>>        }
>>        inflater.setInput(buf,0,read);
>>        byte res[] = new byte[1000];
>>        while(true)
>>        {
>>                int resRead = inflater.inflate(res);
>>                if(resRead !=0)
>>                {
>>                        out.write(res,0,resRead);
>>                        continue;
>>                }
>>                if(inflater.finished() || inflater.needsDictionary() ||
>> (inflater.needsInput()&&  in.available()==0))
>>
>>                {
>>                        out.close();
>>                        return out;
>>                }
>>                if(inflater.needsInput())
>>                {
>>                        read = in.read(buf);
>>                        inflater.setInput(buf,0,read);
>>                }
>>        }
>>    }
>>
>>
>> and then
>> FlateFilter.decode(InputStream compressedData, OutputStream result,
>> COSDictionary options, int filterIndex )
>>
>> looks like
>>
>>
>>  if (compressedData.available()>  0)
>>            {
>>                try
>>                {
>>                        baos =  decompress(compressedData);
>>                }
>> if (predictor==-1 || predictor == 1 )
>>                {
>>                   result.write(baos.toByteArray());
>>                }
>> else
>> {
>>  use the bytearrayoutput stream as before ...
>> }
>>
>>
>> Thoughts ?
>>
>>
>>
>> Dave Smith
>> Candata Ltd.
>> 416-493-9020x2413
>> Direct: 416-855-2413
>
>
>
> --
>
>  Timo Boehme
>  OntoChem GmbH
>  H.-Damerow-Str. 4
>  06120 Halle/Saale
>  T: +49 345 4780474
>  F: +49 345 4780471
>  timo.boe...@ontochem.com
>
> _____________________________________________________________________
>
>  OntoChem GmbH
>  Geschäftsführer: Dr. Lutz Weber
>  Sitz: Halle / Saale
>  Registergericht: Stendal
>  Registernummer: HRB 215461
> _____________________________________________________________________
>

Reply via email to