30000 records in 10MB files.
Files can vary and the number of records also can vary.




If the data is 10MB and you have 30k records, and it takes ~2 mins to
process each record, I'd suggest using map to distribute the data across
several reducers then do the actual processing on reduce.
Hmmm... Good idea. Thanks. But is 'Reduce' optimized to do the heavy part of the computation?

Reply via email to