This article by Mat Kelcey about processing Common Crawl data looks interesting too:
http://matpalm.com/blog/2011/12/10/common_crawl_visible_text/ Cheers, Tom On Tue, Dec 20, 2011 at 1:47 AM, Andrei Savu <[email protected]> wrote: > Here is an interesting article about how to process Common Crawl data using > Amazon EMR: > http://www.commoncrawl.org/mapreduce-for-the-masses/ > > I think we should be able to do something similar with Whirr quite easily. > > I will give it a try soon. > > -- Andrei Savu
