[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903717#action_12903717
 ] 

Dick King commented on MAPREDUCE-2039:
--------------------------------------

The runtime space requirements for this will be noticeable but modest.  Each 
task in progress will need a {{float}} or two for the exponentially smoothed 
value, plus an {{int}} for the most recent update [needed for the exponential 
smoothing calculation].  Although we internally represent times as a {{long}} , 
an {{int}} is enough here because the wrap-around time is 47 days.  Jobs, and 
therefore tasks, can't run this long for other reasons.

> Improve speculative execution
> -----------------------------
>
>                 Key: MAPREDUCE-2039
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>
> In speculation, the framework issues a second task attempt on a task where 
> one attempt is already running.  This is useful if the running attempt is 
> bogged down for reasons outside of the task's code, so a second attempt 
> finishes ahead of the existing attempt, even though the first attempt has a 
> head start.
> Early versions of speculation had the weakness that an attempt that starts 
> out well but breaks down near the end would never get speculated.  That got 
> fixed in HADOOP:2141 , but in the fix the speculation wouldn't engage until 
> the performance of the old attempt, _even counting the early portion where it 
> progressed normally_ , was significantly worse than average.
> I want to fix that by overweighting the more recent progress increments.  In 
> particular, I would like to use exponential smoothing with a lambda of 
> approximately 1/minute [which is the time scale of speculative execution] to 
> measure progress per unit time.  This affects the speculation code in two 
> places:
>    * It affects the set of task attempts we consider to be underperforming
>    * It affects our estimates of when we expect tasks to finish.  This could 
> be hugely important; speculation's main benefit is that it gets a single 
> outlier task finished earlier than otherwise possible, and we need to know 
> which task is the outlier as accurately as possible.
> I would like a rich suite of configuration variables, minimally including 
> lambda and possibly weighting factors.  We might have two exponentially 
> smoothed tracking variables of the progress rate, to diagnose attempts that 
> are bogged down and getting worse vrs. bogging down but improving.
> Perhaps we should be especially eager to speculate a second attempt.  If a 
> task is deterministically failing after bogging down [think "rare infinite 
> loop bug"] we would rather take a couple of our attempts in parallel to 
> discover the problem sooner.
> As part of this patch we would like to add benchmarks that simulate rare 
> tasks that behave poorly, so we can discover whether this change in the code 
> is a good idea and what the proper configuration is.  Early versions of this 
> will be driven by our assumptions.  Later versions will be driven by the 
> fruits of MAPREDUCE:2037

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to