Hi Yan Cheung,

See below - but one caveat...Oleg could very well correct all of my comments below :)

On Aug 16, 2009, at 6:17pm, yccheok wrote:

Hi Ken,

So, in my case, I should set

httpConnectionManagerParams.setDefaultMaxConnectionsPerHost(50);

Yes, if all of your requests will be coming from the same domain, and you're going to be hitting it with all 50 threads at the same time. But that's not a normal use case - hope you're really good friends with that site's ops team :)

E.g. in Bixo we configure HttpClient for one thread per host, as that's what you need for polite crawling.

httpConnectionManagerParams.setMaxTotalConnections(50);
// hostConfiguration will be obtained from HttpClient iteself.
httpConnectionManagerParams.setMaxConnectionsPerHost(HostConfiguration
hostConfiguration, 50);

Is there any side effect of setting the number of too high, like 1000?

I don't know the details of how HttpClient (3.x or 4.x) allocates connections in the pool, but I assume they only create a connection when one is needed, there's no free connection, and the total number of connections is less than this limit.

So leaving aside issues of memory requirements, max # of open sockets, etc. that you'd hit with 1000 active connections, I don't think there would be any issue with using a large value.

If compared to 100 HttpClient with maxConnection = 10 each, will single HttpClient with maxConnection = 1000 performs better? Or it depends case by
case situation?

I think performance will mostly depend on the servers that you're accessing.

See http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/ for a blog post I wrote about crawl performance. This was using Bixo and HttpClient 4.0

I know HttpClient does maintain its own connection pool. Does "this figure" (1000) affect "number of simultaneous connections allowed" in a given time? or "this figure" itself is the number of connections allowed in HttpClient
connection pool?

There are two HttpClient-based limits for maximum number of simultaneous connections - the max connections per host and the max total connections. Assuming you are hitting 1000 different hosts, then you could have 1000 simultaneous connections. Though you'll also typically run into other limits, like running out of system memory due to the amount of stack space used per thread, or DNS lookups becoming slow, etc.

-- Ken


Ken Krugler wrote:

Hi Yan Cheng,

I haven't used HttpClient 3.x for a while - switched to 4.0 and
haven't looked back.

But in general method A is going to work better. You can configure the
MultiThreadedHttpConnectionManager with a maximum number of threads -
e.g. you could pick a number equal to the max # of threads that you
know will be using it. If it's configured with less than the max
number of threads, then some of your connection requests will block
until a free connection becomes available - and if these exceeds a
(configurable) limit, you'll get an exception.

In extreme situations I've run with up to 1000 threads and one
connection manager, so I don't think you'll hit any limits there.

-- Ken


On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:

Hi all,

All the while, I am using HttpClient in multithreaded environment.
For every threads, when they initiate a connection, they will create
a complete new HttpClient instance.

Recently, I discover, by using this approach, it can cause the user
is having too many port being opened, and most of the connections
are in TIME_WAIT state.

http://www.opensubscriber.com/message/commons-httpclient-...@jakarta.apache.org/86045.html

Hence, instead of per thread doing :
HttpClient c = new HttpClient();
try {
  c.executeMethod(method);
}
catch(...) {
}
finally {
  method.releaseConnection();
}


We plan to have :

[METHOD A]

// global_c is initialized once through
// HttpClient global_c = new HttpClient(new
MultiThreadedHttpConnectionManager());

try {
  global_c.executeMethod(method);
}
catch(...) {
}
finally {
  method.releaseConnection();
}

In normal situation, global_c will be accessed by 50++ threads
concurrently. I was wondering, whether this will occur any
performance issue? Is MultiThreadedHttpConnectionManager using lock-
free mechanism to implement its thread safe policy?

It is possible if 10 threads are using global_c, will the other 40
threads being locked?

Or will it better if in every threads, I create a instance for every
HttpClient, but release the connection manager explicitly.

[METHOD B]
HttpClient c = new HttpClient();
try {
  c.executeMethod(method);
}
catch(...) {
}
finally {
  method.releaseConnection();
  c.getHttpConnectionManager().shutdown();
}

Is c.getHttpConnectionManager().shutdown() suffer performance issues?

May I know which method (A or B) is better, for application using 50+
+ threads?

I am using HttpClient 3.1

Thanks and Regards
Yan Cheng Cheok


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

Reply via email to