Hi,

Thanks for the reply!

I've tried the configurations that is in the link, it didn't seem to help
much, at least not to get it up to 20 pages/sec. Could it be that I'm doing
an Intranet search?

I just really want to know how did other people get their performance to be
so fast??

Any pointers are appreciated! Thanks!!

Audrey


Audrey Liu wrote:
> 
> Hi,
> 
> I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000
> pages, ~28,000 htmls). I've seen other posts where people mentioned they
> can get their crawler to do 20pages/sec, and the best I've seen so far is
> only 8 pages/sec.
> 
> I've also read that the fetcher threads tend to block when it tries to
> fetch pages from the same host. So I'm wondering what kind of
> configurations should I set to get the best performance, my current
> configurations in nutch-site.xml is as follows:
> 
> <property>
>   <name>fetcher.threads.fetch</name>
>   <value>200</value>
> </property>
> 
> <property>
>   <name>fetcher.threads.per.host</name>
>   <value>50</value>
> </property>
> 
> <property>
>   <name>http.max.delays</name>
>   <value>1</value>
> </property>
> 
> Any pointers are greatly appreciated!! Thanks in advance.
> 
> AL
> 

-- 
View this message in context: 
http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11750336
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to