Hi, Thanks for the reply!
I've tried the configurations that is in the link, it didn't seem to help much, at least not to get it up to 20 pages/sec. Could it be that I'm doing an Intranet search? I just really want to know how did other people get their performance to be so fast?? Any pointers are appreciated! Thanks!! Audrey Audrey Liu wrote: > > Hi, > > I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000 > pages, ~28,000 htmls). I've seen other posts where people mentioned they > can get their crawler to do 20pages/sec, and the best I've seen so far is > only 8 pages/sec. > > I've also read that the fetcher threads tend to block when it tries to > fetch pages from the same host. So I'm wondering what kind of > configurations should I set to get the best performance, my current > configurations in nutch-site.xml is as follows: > > <property> > <name>fetcher.threads.fetch</name> > <value>200</value> > </property> > > <property> > <name>fetcher.threads.per.host</name> > <value>50</value> > </property> > > <property> > <name>http.max.delays</name> > <value>1</value> > </property> > > Any pointers are greatly appreciated!! Thanks in advance. > > AL > -- View this message in context: http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11750336 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general