Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by mozdevil:
http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial

------------------------------------------------------------------------------
  }}}
  
  Copy the data to local and searching can be done on the new data.
-   
+ 
+ = Comments =
+ == Number of map reduce tasks ==
  I noticed that the number of map and reduce task has an impact on the 
performance of Hadoop. Many times after crawling a lot of pages the nodes 
reported 'java.lang.OutOfMemoryError: Java heap space' errors, this happend 
also in the indexing part. Increasing the number of maps solved these problems, 
with an index that has over 200.000 pages I needed 306 maps in total over 3 
machines. By setting the mapred.maps.tasks property in hadoop-site.xml to 99 
(much higher than what is advised in other tutorials and in the hadoop-site.xml 
file) that problem is solved.
  
+ See [http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces] for more 
info about the number of map reduce tasks.
+ 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to