Hi, I'm new to Hadoop, and I'm trying to figure out the best way to use it to parallelize a large number of calls to a web API, and then process and store the results.
The calls will be regular HTTP requests, and the URLs follow a known format, so can be generated easily. I'd like to understand how to apply the MapReduce pattern to this task – should I have one mapper generating URLs, and another making the HTTP calls and mapping request URLs to their response documents, for example? Any links to sample code, examples etc. would be great. Cheers, Phil