I have a distributed crawler that I'm debugging, and it appears that my last issue is with HttpClient. Currently what happens is, everything runs well for a while, but then some of the nodes stop working because the seem to be unable to get a connection from the connection manager, because they are waiting on a semaphore notification that never comes.
If anyone has any ides on this one, please let me know! -George (debug info follows) I enabled logging on the MultiThreadedHttpConnectionManager, specifically because that's where the failure seems to occur. Here are the last few debug lines from a node that has stopped: DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] Freeing connection, hostConfig=HostConfiguration[host=http://www.oregonrealtors.org] DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] Notifying thread waiting on host pool, hostConfig=HostConfiguration[host=http://www.oregonrealtors.org] DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] Getting free connection, hostConfig=HostConfiguration[host=http://www.oregonrealtors.org] DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] enter HttpConnectionManager.releaseConnection(HttpConnection) DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] Freeing connection, hostConfig=HostConfiguration[host=http://www.oregonrealtors.org] DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] No-one waiting on host pool, notifying next waiting thread. DEBUG [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager] Unable to get a connection, waiting..., hostConfig=HostConfiguration[host=http://evangelion.polito.it] In addition, here is a jrockit thread dump that indicates a download thread is trying to get a connection from the connection manager, but is waiting for notification (there are actually many threads in this state): "ThreadPool-Crawler DownloadJob PooledThread-0-running" id=256 idx=0xa2 tid=-1325696080 prio=5 alive, in native, waiting -- Waiting for notification on: org/apache/commons/httpclient/[EMAIL PROTECTED] at jrockit/vm/Threads.waitForSignal()V(Native Method) at jrockit/vm/Locks.wait(Ljava/lang/Object;)V(Unknown Source)[optimized] at jrockit/vm/Locks.wait(Ljava/lang/Object;J)V(Unknown Source) at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:509) ^-- Lock released while waiting: org/apache/commons/httpclient/[EMAIL PROTECTED] at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:394) at org/apache/commons/httpclient/HttpMethodDirector.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)V(HttpMethodDirector.java:152) at org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HostConfiguration;Lorg/apache/commons/httpclient/HttpMethod;Lorg/apache/commons/httpclient/HttpState;)I(HttpClient.java:396) at org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)I(HttpClient.java:324)[inlined] at crawler/util/FetcherUtil.getContentAsString(Ljava/lang/String;)Ljava/lang/String;(FetcherUtil.java:68)[optimized] at crawler/fetch/Downloadable.run()V(Downloadable.java:44)[optimized] at services/threadpool/ThreadPoolThread.run()V(ThreadPoolThread.java:83) at jrockit/vm/RNI.c2java()V(Native Method) -- end of trace __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
