Am 2015-05-24 um 14:25 schrieb Oleg Kalnichevski:
On Sun, 2015-05-24 at 13:02 +0200, Michael Osipov wrote:
Am 2015-05-24 um 12:17 schrieb Oleg Kalnichevski:
On Sun, 2015-05-24 at 00:29 +0200, Michael Osipov wrote:
Am 2015-05-23 um 22:29 schrieb Oleg Kalnichevski:
On Sat, 2015-05-23 at 22:09 +0200, Michael Osipov wrote:
Hi,
we are experiencing a (slight) performance problem with HttpClient 4.4.1
while downloading big files from a remote server in the corporate intranet.
A simple test client:
HttpClientBuilder builder = HttpClientBuilder.create();
try (CloseableHttpClient client = builder.build()) {
HttpGet get = new HttpGet("...");
long start = System.nanoTime();
HttpResponse response = client.execute(get);
HttpEntity entity = response.getEntity();
File file = File.createTempFile("prefix", null);
OutputStream os = new FileOutputStream(file);
entity.writeTo(os);
long stop = System.nanoTime();
long contentLength = file.length();
long diff = stop - start;
System.out.printf("Duration: %d ms%n",
TimeUnit.NANOSECONDS.toMillis(diff));
System.out.printf("Size: %d%n", contentLength);
float speed = contentLength / (float) diff * (1_000_000_000 / 1_000_000);
System.out.printf("Speed: %.2f MB/s%n", speed);
}
After at least 10 repetions I see that the 182 MB file is download
within 24 000 ms with about 8 MB/s max. I cannot top that.
I have tried this over and over again with curl and see that curl is
able to saturate the entire LAN connection (100 Mbit/s).
My tests are done on Windows 7 64 bit, JDK 7u67 32 bit.
Any idea what the bottleneck might me?
Thanks for the quick response.
(1) Curl should be using zero copy file transfer which Java blocking i/o
does not support. HttpAsyncClient on the other hand supports zero copy
file transfer and generally tends to perform better when writing content
out directly to the disk.
I did try this [1] example and my heap exploaded. After increasing it to
-Xmx1024M, it did saturate the entire connection.
This sounds wrong. The example below does not use zero copy (with zero
copy there should be no heap memory allocation at all).
This example demonstrates how to use zero copy file transfer
http://hc.apache.org/httpcomponents-asyncclient-4.1.x/httpasyncclient/examples/org/apache/http/examples/nio/client/ZeroCopyHttpExchange.java
I have seen this example but there is no ZeroCopyGet. I haven't found
any example which explicitly says use zero-copy for GETs. The example
from [1] did work but with the explosion. What did I wrong here.
Zero copy can be employed only if a message encloses an entity in it.
Therefore there is no such thing as ZeroCopyGet in HC. One can execute a
normal GET request and use a ZeroCopyConsumer to stream content out
directly to a file without any intermediate buffering in memory.
OK, that has confirmed my assumptions.
(2) Use larger socket / intermediate buffers. Default buffer size used
by Entity implementations is most likely suboptimal.
That did not make any difference. I have changed:
1. Socket receive size
2. Employed a buffered input stream
3. Manually copied the stream to a file
I have varied the buffer size from 2^14 to 2^20 bytes. No avail.
Regardless of this, your tip with zero copy helped me a lot.
Unfortunately, this is just a little piece in a performance degregation
chain a colleague has figured out. HttpClient acts as an intermediate in
a webapp which receives a request via REST from a client, processes that
and opens up the stream to the huge files from a remote server. Without
caching the files to disk, I am passing the Entity#getContent stream
back to the client. The degreation is about 75 %.
After rethinking your tips, I just checked the servers I am pulling off
data. One is slow the otherone is fast. Transfer speeds with piping the
streams from the fast server remains at 8 MB/s which is what I wanted
after I have identified an issue with my custom HttpResponseInputStream.
I modified my code to use the async client and it seems to pipe with
maximum LAN speed though it looks weird with curl now. Curl blocks for
15 seconds and within a second the entire stream is written down to disk.
It all sounds very bizarre. I see no reason why HttpAsyncClient without
zero copy transfer should do any better than HttpClient in this
scenario.
So you are saying something is probably wrong with my client setup?
I think it is not unlikely.
Assuming I'd have the time to investigate that, I have currently no idea
where to start looking for the mismatch.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]