On Jun 11, 2009, at 11:06 AM, Tarandeep Singh wrote:

I am trying to understand the effects of increasing block size or minimum
split size. If I increase them, then a mapper will process more data,
effectively reducing the number of mappers that will be spawned. As there is
an overhead in starting mappers, so this seems good.

Even more important is that the shuffle depends on the number of maps * reduces. For the sort benchmark, we found that it was much more performant to have a few very large maps (500MB+)

-- Owen

Reply via email to