This article by Mat Kelcey about processing Common Crawl data looks
interesting too:

http://matpalm.com/blog/2011/12/10/common_crawl_visible_text/

Cheers,
Tom

On Tue, Dec 20, 2011 at 1:47 AM, Andrei Savu <[email protected]> wrote:
> Here is an interesting article about how to process Common Crawl data using
> Amazon EMR:
> http://www.commoncrawl.org/mapreduce-for-the-masses/
>
> I think we should be able to do something similar with Whirr quite easily.
>
> I will give it a try soon.
>
> -- Andrei Savu

Reply via email to