I realize that the intended purpose of speculative execution is to overcome 
individual slow tasks...and I have read that it explicitly is *not* intended to 
start copies of a task simultaneously and to then race them, but rather to 
start copies of tasks that "seem slow" after running for a while.

...but aside from merely being slow, sometimes tasks arbitrarily fail, and not 
in data-driven or otherwise deterministic ways.  A task may fail and then 
succeed on a subsequent attempt...but the total job time is extended by the 
time wasted during the initial failed task attempt.

It would super-swell to run copies of a task simultaneously from the starting 
line and simply kill the copies after the winner finishes.  While is is 
"wasteful" in some sense (that is the argument offered for not running 
speculative execution this way to begin with), it would more precise to say 
that different users may have different priorities under various use-case 
scenarios.  The "wasting" of duplicate tasks on extra cores may be an 
acceptable cost toward the higher priority of minimizing job times for a given 
application.

Is there any notion of this in Hadoop?

________________________________________________________________________________
Keith Wiley     kwi...@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Reply via email to