On Wed, Jan 24, 2007 at 08:35:41PM -0800, Doug Judd wrote:
> OK, give me an example where the map() and/or reduce() phases dominate the
> computation.

First, I believe you when you say that your MapReduce jobs are dominated
by sorting.  A lot of people do a lot of different things.  My stuff
spends very little time in sorting.

I'm currently using MapReduce for evolutionary computation.  I ran a job
a few days ago on 128 processors which took 17.7 minutes for the job to
run.  There were only 1000 records to be sorted, and based on past
experiments, I'd be surprised if 15 seconds were spent on sorting.  (By
the way, on a single-processor implementation of the same algorithm,
without any parallelization and without any sorting, it took a day and a
half to make the same computations).

Sure, the map phase is only order n and the sort is n log n, but that's
completely irrelevant.  Complexity is overrated.  It says a lot about
asymptotic scalability, but alone it doesn't tell you anything about how
long something will take to run.  That pesky constant multiplier can
make a huge difference.

Anyway, it's a different perspective.

Thanks.


-- 
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

Attachment: signature.asc
Description: Digital signature

Reply via email to