I have a distributed crawler that I'm debugging, and
it appears that my last issue is with HttpClient.
Currently what happens is, everything runs well for a
while, but then some of the nodes stop working because
the seem to be unable to get a connection from the
connection manager, because they are waiting on a
semaphore notification that never comes.

If anyone has any ides on this one, please let me
know!

-George (debug info follows)

I enabled logging on the
MultiThreadedHttpConnectionManager, specifically
because that's where the failure seems to occur. Here
are the last few debug lines from a node that has
stopped:

DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Freeing connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Notifying thread waiting on host pool,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Getting free connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.releaseConnection(HttpConnection)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Freeing connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
No-one waiting on host pool, notifying next waiting
thread.
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Unable to get a connection, waiting...,
hostConfig=HostConfiguration[host=http://evangelion.polito.it]



In addition, here is a jrockit thread dump that
indicates a download thread is trying to get a
connection from the connection manager, but is waiting
for notification (there are actually many threads in
this state):



"ThreadPool-Crawler DownloadJob
PooledThread-0-running" id=256 idx=0xa2
tid=-1325696080 prio=5 alive, in native, waiting
    -- Waiting for notification on:
org/apache/commons/httpclient/[EMAIL PROTECTED]
    at jrockit/vm/Threads.waitForSignal()V(Native
Method)
    at
jrockit/vm/Locks.wait(Ljava/lang/Object;)V(Unknown
Source)[optimized]
    at
jrockit/vm/Locks.wait(Ljava/lang/Object;J)V(Unknown
Source)
    at
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:509)
    ^-- Lock released while waiting:
org/apache/commons/httpclient/[EMAIL PROTECTED]
    at
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:394)
    at
org/apache/commons/httpclient/HttpMethodDirector.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)V(HttpMethodDirector.java:152)
    at
org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HostConfiguration;Lorg/apache/commons/httpclient/HttpMethod;Lorg/apache/commons/httpclient/HttpState;)I(HttpClient.java:396)
    at
org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)I(HttpClient.java:324)[inlined]
    at
crawler/util/FetcherUtil.getContentAsString(Ljava/lang/String;)Ljava/lang/String;(FetcherUtil.java:68)[optimized]
    at
crawler/fetch/Downloadable.run()V(Downloadable.java:44)[optimized]
    at
services/threadpool/ThreadPoolThread.run()V(ThreadPoolThread.java:83)
    at jrockit/vm/RNI.c2java()V(Native Method)
    -- end of trace


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to