If you are doing a lot of URL filtering with regular expressions, this can
take a massive amount of time in reduce. There may be some speedups
possible, depending upon your usage patterns; some are as simple as config
changes, others will take a patch (which I haven't contributed back yet, but
will).
Let me know if you do a lot of filtering, and I'll post a longer list of
suggestions.
-Doug
Benjamin Higgins wrote:
>
> I'd like to know what are all the known techniques for speeding up
> MapReduce
> for a single user machine.
>
> So far, I know of this patch:
>
> http://issues.apache.org/jira/browse/NUTCH-395
>
> I also am reading that changing hadoop-site.xml can help, but I don't know
> what changes to make.
>
> Please add anything you've found that will help. I am considering going
> back to 0.7 if I can't get Nutch to go faster. In my case I am also
> crawling just a single site.
>
> Ben
>
>
--
View this message in context:
http://www.nabble.com/Guide-to-speeding-up-Map-Reduce-on-single-machine-setup-tf2680869.html#a7479019
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general