Cool, sounds like you're doing some interesting stuff. Thanks for your perspective.
- Doug On 1/24/07, Andrew McNabb <[EMAIL PROTECTED]> wrote:
On Wed, Jan 24, 2007 at 08:35:41PM -0800, Doug Judd wrote: > OK, give me an example where the map() and/or reduce() phases dominate the > computation. First, I believe you when you say that your MapReduce jobs are dominated by sorting. A lot of people do a lot of different things. My stuff spends very little time in sorting. I'm currently using MapReduce for evolutionary computation. I ran a job a few days ago on 128 processors which took 17.7 minutes for the job to run. There were only 1000 records to be sorted, and based on past experiments, I'd be surprised if 15 seconds were spent on sorting. (By the way, on a single-processor implementation of the same algorithm, without any parallelization and without any sorting, it took a day and a half to make the same computations). Sure, the map phase is only order n and the sort is n log n, but that's completely irrelevant. Complexity is overrated. It says a lot about asymptotic scalability, but alone it doesn't tell you anything about how long something will take to run. That pesky constant multiplier can make a huge difference. Anyway, it's a different perspective. Thanks. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFuEN0q01gmIgmaGgRAnMmAJ9yPT7lDgHvmMFEKpq1mzjFe88WxACeLMmA XalAkq0TiFlVAXufICixtTY= =VIUR -----END PGP SIGNATURE-----
