I'm also getting very slow crawl rates of around 6 pages/sec.  I haven't been 
able to analyze this issue at length yet, for instance by using ntop to see if 
my network connection is pegged.  However, I did get a slightly better result 
(about 20%-30% better) by following Sami Siren's suggestion:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06533.html

I've also tried fetch2 which I think is faster, though again I'm not getting a 
radical improvement.

--Kai

----- Original Message ----
From: Audrey Liu <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, July 20, 2007 1:56:52 PM
Subject: tweaking config files for better performance


Hi,

I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000
pages, ~28,000 htmls). I've seen other posts where people mentioned they can
get their crawler to do 20pages/sec, and the best I've seen so far is only 8
pages/sec.

I've also read that the fetcher threads tend to block when it tries to fetch
pages from the same host. So I'm wondering what kind of configurations
should I set to get the best performance, my current configurations in
nutch-site.xml is as follows:

<property>
  <name>fetcher.threads.fetch</name>
  <value>200</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>50</value>
</property>

<property>
  <name>http.max.delays</name>
  <value>1</value>
</property>

Any pointers are greatly appreciated!! Thanks in advance.

AL
-- 
View this message in context: 
http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11715927
Sent from the Nutch - User mailing list archive at Nabble.com.








       
____________________________________________________________________________________
Got a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
 
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to