I've installed nutch 0.8.1 on a single node (a P4 3.0Ghz dual processor
with RAM 1,5GB ) and my configuration is
threads = 20
depth = 1000
topN = 1000
on linux 2.6.9.
I'm trying an intranet crawling on a site with more than 50000 pages.
I've launched JVM with standard heap options JAVA_HEAP_MAX=-Xmx1600m
After 10 hours of crawling, my logs contains a lot of OutOfMemoryError
2007-07-18 10:14:32,535 INFO fetcher.Fetcher - fetch of
http://www.anci.it/stampa.cfm?layout=dettaglio&IdSez=2446&IdDett=5936
failed with: java.lang.OutOfMemoryError
and nutch process died.
Anyone has this kind of experience? I believe I could modify
JAVA_HEAP_MAX, but probably the problem could appear another time
subsequently.
Perhaps is the problem related to my configuration or is a memory leak
of Fetcher class ? Do I need to install a distributed configuration with
hadoop ?
Pierluigi D'Amadio
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general