[ 
https://issues.apache.org/jira/browse/HADOOP-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549651
 ] 

Amar Kamat commented on HADOOP-2247:
------------------------------------

_THE *WAIT-KILL* DILEMMA_
Following are the issues to be considered while deciding whether a map should 
be killed or not. Earlier the backoff function used to backoff by a random 
amount between 1min-6min. Now after HADOOP-1984, the backoff function is 
exponential in nature. The total amount to time spent by a reducer on fetching 
a map output before giving up is {{max-backoff}} in total. In all ({{3* 
max-backoff}}) time is required to kill a map task by a reducer. So first thing 
to do is to adjust the {{mapred.reduce.max.backoff}} parameter so that the map 
is not killed early. Other parameters which we are working on is as follows
* *Reducer-health* : There should a way to decide how is the reducer 
performing. One such parameter is ({{num-fail-fetches/num-fetches}}). Roughly 
this ratio > 50% conveys that the reducer is not performing well enough.
* *Reducer-progress* : There should a way to decide how is the reducer 
progressing. One such parameter is ({{num-outputs-fetched/num-maps}}). Roughly 
this ratio > 50% conveys that the reducer has made considerable progress.
* *Avg map completion time* : This time should determine when the fetch attempt 
should be considered as failed hence JT should be reported.
* *Num-reducers* : The number of reducers in a particular job might provide 
some insight on how the contented the resources might be. (Low the number of 
reducers + failing output fetch a single map) indicate that the problem is 
map-sided. If the reducer is not able to fetch any map then the problem is 
reducer-sided. If there are many reducers and failures in map fetch then there 
is a high chance of congestion.

One thing to notice is that
* it requires ({{max-backoff*3}}) amount of time to kill a map.
* it requires 5 minutes (in worst case) to kill a reducer when there are 5 
fetches fail simultaneously.

A better strategy would be to make
* *avg-map-completion-time* as a parameter in deciding the time to report 
failure. {{max-backoff}} should also be dependent on avg map completion time.
* *num-reducers* as a parameter in deciding how much to backoff and whether the 
map should be killed or the reducer should backoff(wait).
* *(num-maps - num-finished)* and *(num-fetch-fail / num-fetched)* as a 
parameter in deciding the time to kill the reducer. A good strategy would be to 
kill a reducer if it fails to fetch output of 50% of the maps and not many map 
output are fetched. It could be a case that the reducer has fetched the map 
outputs but with some failures. In that case the fetch-fail ratio will be 
higher but the progress will also be considerable. We don't want to penalize a 
reducer which has fetched many map outputs with lot of failures.
* *ratio-based-map-killing* : JT should also kill a map based on some % along 
with the hard coded number 3. For example kill a map if 50% of the reducers 
report failures and num-reports >= 3. Also it might help the JT to have a 
global idea of what all map-outputs are being tried so that the scheduling of 
new tasks and killing of maps can be decided.
* *fetch-success event notification* : JT should be informed by a reducer about 
a successful map-output-fetch event as a result of which the counters regarding 
the killing of that map should be reset. In a highly congested system finding 3 
reducers that fail in the first attempt for a particular map is easy.
 ----
Comments ?

> Mappers fail easily due to repeated failures
> --------------------------------------------
>
>                 Key: HADOOP-2247
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2247
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>         Environment: 1400 Node hadoop cluster
>            Reporter: Srikanth Kakani
>            Priority: Blocker
>             Fix For: 0.15.2
>
>
> Related to HADOOP-2220, problem introduced in HADOOP-1158
> At this scale hardcoding the number of fetch failures to a static number: in 
> this case 3 is never going to work. Although the jobs we are running are 
> loading the systems 3 failures can randomly occur within the lifetime of a 
> map. Even fetching the data can cause enough load for so many failures to 
> occur.
> We believe that number of tasks and size of cluster should be taken into 
> account. Based on which we believe that a ratio between total fetch attempts 
> and total failed attempts should be taken into consideration.
> Given our experience with a task should be declared "Too many fetch failures" 
> based on:
> failures > n /*could be 3*/ && (failures/total attempts) > k% /*could be 
> 30-40%*/
> Basically the first factor is to give some headstart to the second factor, 
> second factor then takes into account the cluster size and the task size.
> Additionally we could take recency into account, say failures and attempts in 
> last one hour. We do not want to make it too small.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to