Hi,

I'm new to Hadoop, and I'm trying to figure out the best way to use it
to parallelize a large number of calls to a web API, and then process
and store the results.

The calls will be regular HTTP requests, and the URLs follow a known
format, so can be generated easily. I'd like to understand how to
apply the MapReduce pattern to this task – should I have one mapper
generating URLs, and another making the HTTP calls and mapping request
URLs to their response documents, for example?

Any links to sample code, examples etc. would be great.

Cheers,
Phil

Reply via email to