Think of it in partition terms. If you know that your map-splits X, Y
and Z won't emit any key of partition P, then the Pth reducer can jump
ahead and run without those X, Y and Z completing their processing.

Otherwise, a reducer can't run until all maps have completed, in fear
of losing a few keys that may have come out of the maps it has skipped
fetching from. To some this may be tolerable, or some would be OK to
receive it later - but thats gonna add complexity when you could just
fetch continuously and wait.

Should be easy to take the MRv2 application [0] and add such a thing
in today, if you need it.

[0] - Given the confusion between what MRv2 and YARN mean individually
(they get mixed up too much), hope this blog post of mine helps:
http://www.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/

On Sat, Oct 13, 2012 at 7:46 AM, Jay Vyas <jayunit...@gmail.com> wrote:
> Is it possible for reducers to start (not just copying, but actually)
> "reducing" before all mappers are done, speculatively?
>
> In particular im asking this because Im curious about the internals of how
> the shuffle and sort might (or might not :)) be able to support this.



-- 
Harsh J

Reply via email to