Cool, sounds like you're doing some interesting stuff.  Thanks for your
perspective.

- Doug

On 1/24/07, Andrew McNabb <[EMAIL PROTECTED]> wrote:

On Wed, Jan 24, 2007 at 08:35:41PM -0800, Doug Judd wrote:
> OK, give me an example where the map() and/or reduce() phases dominate
the
> computation.

First, I believe you when you say that your MapReduce jobs are dominated
by sorting.  A lot of people do a lot of different things.  My stuff
spends very little time in sorting.

I'm currently using MapReduce for evolutionary computation.  I ran a job
a few days ago on 128 processors which took 17.7 minutes for the job to
run.  There were only 1000 records to be sorted, and based on past
experiments, I'd be surprised if 15 seconds were spent on sorting.  (By
the way, on a single-processor implementation of the same algorithm,
without any parallelization and without any sorting, it took a day and a
half to make the same computations).

Sure, the map phase is only order n and the sort is n log n, but that's
completely irrelevant.  Complexity is overrated.  It says a lot about
asymptotic scalability, but alone it doesn't tell you anything about how
long something will take to run.  That pesky constant multiplier can
make a huge difference.

Anyway, it's a different perspective.

Thanks.


--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFuEN0q01gmIgmaGgRAnMmAJ9yPT7lDgHvmMFEKpq1mzjFe88WxACeLMmA
XalAkq0TiFlVAXufICixtTY=
=VIUR
-----END PGP SIGNATURE-----



Reply via email to