Done https://issues.apache.org/jira/browse/PDFBOX-1232
Dave Smith Candata Ltd. 416-493-9020x2413 Direct: 416-855-2413 On Tue, Feb 21, 2012 at 2:51 AM, Timo Boehme <timo.boe...@ontochem.com> wrote: > Could you please open a JIRA issue for this problem? > > > Thanks, > > Timo > > > Am 21.02.2012 02:44, schrieb Dave Smith: >> >> I am having some problems with certain pdf docs and the FlateDecoder. >> So for example >> >> A chunk like this ... >> 3 0 obj >> << >> /Type /XObject >> /Subtype /Form >> /FormType 1 >> /Resources<< /Font 4 0 R >> /ProcSet [/PDF /ImageC /Text]>> >> /BBox [0 0 595 842] >> /Matrix [1 0 0 1 0 0] >> /Filter /FlateDecode >> /Length 5>> >> stream >> H<89>^C^@ >> endstream >> endobj >> >> The blob is 72, -119, 3, 0, 13 decimal. >> >> Now if I run it through the jcraft zlib decoder it works (it is an >> empty string but that is beside the point) in latest trunk it throws >> an end of data exception. The problem is that the decode chunk ends >> without a terminating bit in the stream and thus the EOF. According to >> the deflate spec it is not required so I would consider this a bug on >> the Java InflateInputStream. >> >> >> I recoded the decoder and it seems to work in all my testcases where I >> had zlib streams with and without the Z_STREAM_END set. The code is >> below ... >> >> >> >> protected ByteArrayOutputStream decompress(InputStream in) >> throws IOException, DataFormatException >> { >> ByteArrayOutputStream out = new ByteArrayOutputStream(); >> byte buf[] = new byte[1000]; >> Inflater inflater = new Inflater(); >> int read = in.read(buf); >> if(read == 0) >> { >> return out; >> } >> inflater.setInput(buf,0,read); >> byte res[] = new byte[1000]; >> while(true) >> { >> int resRead = inflater.inflate(res); >> if(resRead !=0) >> { >> out.write(res,0,resRead); >> continue; >> } >> if(inflater.finished() || inflater.needsDictionary() || >> (inflater.needsInput()&& in.available()==0)) >> >> { >> out.close(); >> return out; >> } >> if(inflater.needsInput()) >> { >> read = in.read(buf); >> inflater.setInput(buf,0,read); >> } >> } >> } >> >> >> and then >> FlateFilter.decode(InputStream compressedData, OutputStream result, >> COSDictionary options, int filterIndex ) >> >> looks like >> >> >> if (compressedData.available()> 0) >> { >> try >> { >> baos = decompress(compressedData); >> } >> if (predictor==-1 || predictor == 1 ) >> { >> result.write(baos.toByteArray()); >> } >> else >> { >> use the bytearrayoutput stream as before ... >> } >> >> >> Thoughts ? >> >> >> >> Dave Smith >> Candata Ltd. >> 416-493-9020x2413 >> Direct: 416-855-2413 > > > > -- > > Timo Boehme > OntoChem GmbH > H.-Damerow-Str. 4 > 06120 Halle/Saale > T: +49 345 4780474 > F: +49 345 4780471 > timo.boe...@ontochem.com > > _____________________________________________________________________ > > OntoChem GmbH > Geschäftsführer: Dr. Lutz Weber > Sitz: Halle / Saale > Registergericht: Stendal > Registernummer: HRB 215461 > _____________________________________________________________________ >