James, Keep in mind that Java memory profilers tend to report what resource is not being freed, not what is leaking. Your code is holding a reference to something that it should not. I have done some extensive load/performance testing with my HttpClient based application. After 250 million calls to a Tomcat servlet, there were no observed memory leaks. That was with HttpClient 3.0rc3, Java 1.5.06. Our performance testing also showed that performance slowed down when our application went past 100 threads. This may be due to a limitation with the Tomcat instance we were calling. But with 300 threads, I wonder if you application is spending more time context switching between threads than real work. This was on a 3.0GHz dual processor machine running Linux.
--Steve -----Original Message----- From: James Ostheimer [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 14, 2006 1:53 AM To: [email protected] Subject: Memory leak using httpclient Hi- I am using httpclient in a multi-threaded webcrawler application. I am using the MulitThreadedHttpConnectionManager in conjunction with 300 threads that download pages from various sites. Problem is that I am running out of memory shortly after the process begins. I used JProfiler to analyze the memory stacks and it points to: a.. 76.2% - 233,587 kB - 6,626 alloc. org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString as the culprit (at most there should be a little over 300 allocations as there are 300 threads operating at once). Other relevant information, I am on a Windows XP Pro platform using the SUN JRE that came with jdk1.5.0_06. I am using commons-httpclient-3.0.jar. Here is the code where I initialize the HttpClient: private HttpClient httpClient; public CrawlerControllerThread(QueueThread qt, MessageReceiver receiver, int maxThreads, String flag, boolean filter, String filterString, String dbType) { this.qt = qt; this.receiver = receiver; this.maxThreads = maxThreads; this.flag = flag; this.filter = filter; this.filterString = filterString; this.dbType = dbType; threads = new ArrayList(); lastStatus = new HashMap(); HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams(); htcmp.setMaxTotalConnections(maxThreads); htcmp.setDefaultMaxConnectionsPerHost(10); htcmp.setSoTimeout(5000); MultiThreadedHttpConnectionManager mtcm = new MultiThreadedHttpConnectionManager(); mtcm.setParams(htcmp); httpClient = new HttpClient(mtcm); } The client reference to httpClient is then passed to all the crawling threads where it is used as follows: private String getPageApache(URL pageURL, ArrayList unProcessed) { SaveURL saveURL = new SaveURL(); HttpMethod method = null; HttpURLConnection urlConnection = null; String rawPage = ""; try { method = new GetMethod(pageURL.toExternalForm()); method.setFollowRedirects(true); method.setRequestHeader("Content-type", "text/html"); int statusCode = httpClient.executeMethod(method); // urlConnection = new HttpURLConnection(method, // pageURL); logger.debug("Requesting: "+pageURL.toExternalForm()); rawPage = method.getResponseBodyAsString(); //rawPage = saveURL.getURL(urlConnection); if(rawPage == null){ unProcessed.add(pageURL); } return rawPage; } catch (IllegalArgumentException e) { //e.printStackTrace(); } catch (HttpException e) { //e.printStackTrace(); } catch (IOException e) { unProcessed.add(pageURL); //e.printStackTrace(); }finally { if(method != null) { method.releaseConnection(); } try { if(urlConnection != null) { if(urlConnection.getInputStream() != null) { urlConnection.getInputStream().close(); } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } urlConnection = null; method = null; } return null; } As you can see, I release the connection in the finally statement, so that should not be a problem. Upon running the getPageApache above the returned page as a string is processed and then set to null for garbage collection. I have been playing with this, closing streams, using HttpUrlConnection instead of the GetMethod, and I cannot find the answer. Indeed it seems the answer does not lie in my code. I greatly appreciate any help that anyone can give me, I am at the end of my ropes with this one. James --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
