Hi Guys,
I have a cluster of 2 machines. I tried to crawl some website which contains
over 1M of pages. I notice that it takes fews days to complete the crawl.
The logs said 0.5p/s at 200kb/s. It seems very slow. I would like to try
Fetcher2, i guess it might improve the performance.
It might be a stupid question but i'm wondering how to i setup my nutch to
use Fetcher2 instead of Fetcher.
Could you help me to understand ?
Beside, what is usually the standard to configure fetcher.server.delay, I
was told that we should set this property to 1 second but i can see in
nutch-default.xml that it has been setup to 5. What is the best to do to
gain in term of performance and to stay enough polite ?
More tricks to gain performance are welcome
E
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general