[ http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331877 ]
Fuad Efendi commented on NUTCH-109: ----------------------------------- Ok, I'll do it tonight; I believe fetcher.server.delay means "Wait for a Response from Server, then throw a Timeout Exception" I can also execute 1000 threads, we will have fair comparison even with fetcher.server.delay=50 seconds (fair - because of too many threads - we will have probably 20 requests per second, 20 * 50 = 1000) > Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation > ----------------------------------------------------------------------- > > Key: NUTCH-109 > URL: http://issues.apache.org/jira/browse/NUTCH-109 > Project: Nutch > Type: Improvement > Components: fetcher > Versions: 0.7, 0.8-dev, 0.6, 0.7.1 > Environment: Nutch: Windows XP, J2SE 1.4.2_09 > Web Server: Suse Linux, Apache HTTPD, apache2-worker, v. 2.0.53 > Reporter: Fuad Efendi > Attachments: protocol-httpclient-innovation-0.1.0.zip > > 1. TCP connection costs a lot, not only for Nutch and end-point web servers, > but also for intermediary network equipment > 2. Web Server creates Client thread and hopes that Nutch really uses > HTTP/1.1, or at least Nutch sends "Connection: close" before closing in JVM > "Socket.close()" ... > I need to perform very objective tests, probably 2-3 days; new plugin > crawled/parsed 23,000 pages for 1,321 seconds; it seems that existing > http-plugin needs few days... > I am using separate network segment with Windows XP (Nutch), and Suse Linux > (Apache HTTPD + 120,000 pages) > Please find attached new plugin based on > http://www.innovation.ch/java/HTTPClient/ > Please note: > Class HttpFactory contains cache of HTTPConnection objects; each object run > each thread; each object is absolutely thread-safe, so we can send multiple > GET requests using single instance: > private static int CLIENTS_PER_HOST = > NutchConf.get().getInt("http.clients.per.host", 3); > I'll add more comments after finishing tests... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira