Hi,
  Currently,I am running nutch in a single Linux box with 1G memory and one 
3GHZ Intel P4 CPU. The hadoop is running in local mode.Now I am trying to 
reparse html pages fetched. The process is very slow,it require more than 10 
days for processing nearly 20M pages. I am wondering whether the  two solutions 
below can improve the performance ?
1. Increase the memory size ?
2. Run the hadoop in  distributed mode,and use more than map/reduce job in one 
machine?
Any suggestions about improve the performance are welcome ! Thanks in advance!! 
 

-chee
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to