If you are doing a lot of URL filtering with regular expressions, this can
take a massive amount of time in reduce. There may be some speedups
possible, depending upon your usage patterns; some are as simple as config
changes, others will take a patch (which I haven't contributed back yet, but
will).

Let me know if you do a lot of filtering, and I'll post a longer list of
suggestions.

     -Doug


Benjamin Higgins wrote:
> 
> I'd like to know what are all the known techniques for speeding up
> MapReduce
> for a single user machine.
> 
> So far, I know of this patch:
> 
> http://issues.apache.org/jira/browse/NUTCH-395
> 
> I also am reading that changing hadoop-site.xml can help, but I don't know
> what changes to make.
> 
> Please add anything you've found that will help.  I am considering going
> back to 0.7 if I can't get Nutch to go faster.  In my case I am also
> crawling just a single site.
> 
> Ben
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Guide-to-speeding-up-Map-Reduce-on-single-machine-setup-tf2680869.html#a7479019
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to