Hi,
I am implementing a multithread crawler using httpclient. The fetching tasks are managed by ThreadPoolExecutor. But I met a weired problem. The memory usage keeps increasing when each new task starts to run. Here's the code of the fetcher:

import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.methods.GetMethod;

public class TestFetcher implements Runnable{
   String urlObj;
   HttpClient client;
   TestFetcher(HttpClient client, String url) {
       this.client = client;
       urlObj = url;
   }
public void run() {
       process(urlObj);
   }
   public synchronized void process(String url){
       HttpMethod method = new GetMethod(url);
       method.setFollowRedirects(true);
       String content = null;
       int fd= 0;
       try{
           client.executeMethod(method);
           Thread.sleep(1000);
           int code = method.getStatusCode();
           if(code == 200){
               content = method.getResponseBodyAsString();
           } else fd = 10+ code/100;
       } catch (Exception e) {
           fd = 10;
       } finally {
           method.releaseConnection();
           method = null;
       }
   }
}

And this is how I create new task:

while(true){
taskPool.execute(new TestFetcher(httpclient, urlPool.getTaskQueue().take()));
   while(some condition) Thread.sleep(delay);
}

I used to use HttpURLConnection do the fetching. There is no memory problem at all. The reason I want to use httpclient is because it can take IP addresses instead of using domain names.

Please help.
Thanks,

Yang


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to