Hi,

I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000
pages, ~28,000 htmls). I've seen other posts where people mentioned they can
get their crawler to do 20pages/sec, and the best I've seen so far is only 8
pages/sec.

I've also read that the fetcher threads tend to block when it tries to fetch
pages from the same host. So I'm wondering what kind of configurations
should I set to get the best performance, my current configurations in
nutch-site.xml is as follows:

<property>
  <name>fetcher.threads.fetch</name>
  <value>200</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>50</value>
</property>

<property>
  <name>http.max.delays</name>
  <value>1</value>
</property>

Any pointers are greatly appreciated!! Thanks in advance.

AL
-- 
View this message in context: 
http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11715927
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to