On 15 Feb 2014, at 12:47 PM, Oleg Kalnichevski <[email protected]> wrote:
> The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
> reviewed the patch once more and could not find anything obviously wrong
> with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see
Subclassing InputStream without overriding the multi-byte read() method is a
recipe for disaster... the inherited method does a byte-by-byte read. You can
see what's happening in this hprof trace:
java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
java.util.zip.Inflater.inflate(Inflater.java:259)
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
java.io.InputStream.read(InputStream.java:179)
it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)
copyContent() would love to read(byte[],int,int) in a buffer, but since
LazyDecompressingInputStream doesn't override it it invokes instead the
read-byte-by-byte inherited method in InputStream, which in turn now calls for
each byte the one-byte read() method from LazyDecompressingInputStream, which
invokes the one-byte read method from InflaterInputStream, which does a
multi-byte, length-one read from GZIPInputStream, which unleashes a similar
call on InflaterInputStream, which unfortunately makes a similar read using the
native inflateBytes() method.
The result is a 10-50x decrease in speed, but I think that a trivial override
of read(byte[],int,int) in LazyDecompressingInputStream will solve the problem.
Ciao,
seba
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]