Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by mozdevil: http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial ------------------------------------------------------------------------------ }}} Copy the data to local and searching can be done on the new data. - + + = Comments = + == Number of map reduce tasks == I noticed that the number of map and reduce task has an impact on the performance of Hadoop. Many times after crawling a lot of pages the nodes reported 'java.lang.OutOfMemoryError: Java heap space' errors, this happend also in the indexing part. Increasing the number of maps solved these problems, with an index that has over 200.000 pages I needed 306 maps in total over 3 machines. By setting the mapred.maps.tasks property in hadoop-site.xml to 99 (much higher than what is advised in other tutorials and in the hadoop-site.xml file) that problem is solved. + See [http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces] for more info about the number of map reduce tasks. + ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs