Re: Incredibly slow gzip downloads

2014-02-15 Thread Oleg Kalnichevski

On Sat, 2014-02-15 at 11:22 +0100, Sebastiano Vigna wrote:
 Something happened from HttpClient 4.3.1 to 4.3.2.
 
 All of a sudden, we are seeing an immense amount of CPU spent into inflating 
 compressed HTTP responses. We were experimenting 5000 pages/s two months 
 ago, but less than 1200 pages/s now (same hardware, etc.), for no apparent 
 reason. The CPU is 100% consumed by Inflater.
 
 The absolutely weird thing is that in profiling we are spending 15% doing 
 inflation and 1.5% doing deflation... and the same amount of data is going 
 through the two phases. Downgrading to 4.3.1 immediately brought back the old 
 performance, and we spend more time in deflation than in inflation (as it 
 should be).
 
 Maybe some change in the buffer size?
 
 Ciao,
 

Here's the diff between 4.3.1 and 4.3.2

https://github.com/apache/httpclient/compare/4.3.1...4.3.2

The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
reviewed the patch once more and could not find anything obviously wrong
with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see
if that resolves the problem.

Oleg

[1] https://issues.apache.org/jira/browse/HTTPCLIENT-1432




-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org



Re: Incredibly slow gzip downloads

2014-02-15 Thread Sebastiano Vigna
On 15 Feb 2014, at 12:47 PM, Oleg Kalnichevski ol...@apache.org wrote:

 The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
 reviewed the patch once more and could not find anything obviously wrong
 with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see

Subclassing InputStream without overriding the multi-byte read() method is a 
recipe for disaster... the inherited method does a byte-by-byte read. You can 
see what's happening in this hprof trace:

java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
java.util.zip.Inflater.inflate(Inflater.java:259)
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)

org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
java.io.InputStream.read(InputStream.java:179)

it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)

copyContent() would love to read(byte[],int,int) in a buffer, but since 
LazyDecompressingInputStream doesn't override it it invokes instead the 
read-byte-by-byte inherited method in InputStream, which in turn now calls for 
each byte the one-byte read() method from LazyDecompressingInputStream, which 
invokes the one-byte read method from InflaterInputStream, which does a 
multi-byte, length-one read from GZIPInputStream, which unleashes a similar 
call on InflaterInputStream, which unfortunately makes a similar read using the 
native inflateBytes() method. 

The result is a 10-50x decrease in speed, but I think that a trivial override 
of read(byte[],int,int) in LazyDecompressingInputStream will solve the problem.

Ciao,

seba


-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org



Re: Incredibly slow gzip downloads

2014-02-15 Thread Oleg Kalnichevski
On Sat, 2014-02-15 at 17:06 +0100, Sebastiano Vigna wrote:
 On 15 Feb 2014, at 12:47 PM, Oleg Kalnichevski ol...@apache.org wrote:
 
  The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
  reviewed the patch once more and could not find anything obviously wrong
  with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see
 
 Subclassing InputStream without overriding the multi-byte read() method is a 
 recipe for disaster... the inherited method does a byte-by-byte read. You can 
 see what's happening in this hprof trace:
 
 java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
 java.util.zip.Inflater.inflate(Inflater.java:259)
 java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
 java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
 java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
 
 org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
 java.io.InputStream.read(InputStream.java:179)
 
 it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)
 
 copyContent() would love to read(byte[],int,int) in a buffer, but since 
 LazyDecompressingInputStream doesn't override it it invokes instead the 
 read-byte-by-byte inherited method in InputStream, which in turn now calls 
 for each byte the one-byte read() method from LazyDecompressingInputStream, 
 which invokes the one-byte read method from InflaterInputStream, which does a 
 multi-byte, length-one read from GZIPInputStream, which unleashes a similar 
 call on InflaterInputStream, which unfortunately makes a similar read using 
 the native inflateBytes() method. 
 
 The result is a 10-50x decrease in speed, but I think that a trivial override 
 of read(byte[],int,int) in LazyDecompressingInputStream will solve the 
 problem.
 

Sebastiano

Could you please raise a JIRA for this defect and tag it as regression?

Oleg



-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org