[ https://issues.apache.org/jira/browse/HADOOP-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549651 ]
Amar Kamat commented on HADOOP-2247: ------------------------------------ _THE *WAIT-KILL* DILEMMA_ Following are the issues to be considered while deciding whether a map should be killed or not. Earlier the backoff function used to backoff by a random amount between 1min-6min. Now after HADOOP-1984, the backoff function is exponential in nature. The total amount to time spent by a reducer on fetching a map output before giving up is {{max-backoff}} in total. In all ({{3* max-backoff}}) time is required to kill a map task by a reducer. So first thing to do is to adjust the {{mapred.reduce.max.backoff}} parameter so that the map is not killed early. Other parameters which we are working on is as follows * *Reducer-health* : There should a way to decide how is the reducer performing. One such parameter is ({{num-fail-fetches/num-fetches}}). Roughly this ratio > 50% conveys that the reducer is not performing well enough. * *Reducer-progress* : There should a way to decide how is the reducer progressing. One such parameter is ({{num-outputs-fetched/num-maps}}). Roughly this ratio > 50% conveys that the reducer has made considerable progress. * *Avg map completion time* : This time should determine when the fetch attempt should be considered as failed hence JT should be reported. * *Num-reducers* : The number of reducers in a particular job might provide some insight on how the contented the resources might be. (Low the number of reducers + failing output fetch a single map) indicate that the problem is map-sided. If the reducer is not able to fetch any map then the problem is reducer-sided. If there are many reducers and failures in map fetch then there is a high chance of congestion. One thing to notice is that * it requires ({{max-backoff*3}}) amount of time to kill a map. * it requires 5 minutes (in worst case) to kill a reducer when there are 5 fetches fail simultaneously. A better strategy would be to make * *avg-map-completion-time* as a parameter in deciding the time to report failure. {{max-backoff}} should also be dependent on avg map completion time. * *num-reducers* as a parameter in deciding how much to backoff and whether the map should be killed or the reducer should backoff(wait). * *(num-maps - num-finished)* and *(num-fetch-fail / num-fetched)* as a parameter in deciding the time to kill the reducer. A good strategy would be to kill a reducer if it fails to fetch output of 50% of the maps and not many map output are fetched. It could be a case that the reducer has fetched the map outputs but with some failures. In that case the fetch-fail ratio will be higher but the progress will also be considerable. We don't want to penalize a reducer which has fetched many map outputs with lot of failures. * *ratio-based-map-killing* : JT should also kill a map based on some % along with the hard coded number 3. For example kill a map if 50% of the reducers report failures and num-reports >= 3. Also it might help the JT to have a global idea of what all map-outputs are being tried so that the scheduling of new tasks and killing of maps can be decided. * *fetch-success event notification* : JT should be informed by a reducer about a successful map-output-fetch event as a result of which the counters regarding the killing of that map should be reset. In a highly congested system finding 3 reducers that fail in the first attempt for a particular map is easy. ---- Comments ? > Mappers fail easily due to repeated failures > -------------------------------------------- > > Key: HADOOP-2247 > URL: https://issues.apache.org/jira/browse/HADOOP-2247 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.15.0 > Environment: 1400 Node hadoop cluster > Reporter: Srikanth Kakani > Priority: Blocker > Fix For: 0.15.2 > > > Related to HADOOP-2220, problem introduced in HADOOP-1158 > At this scale hardcoding the number of fetch failures to a static number: in > this case 3 is never going to work. Although the jobs we are running are > loading the systems 3 failures can randomly occur within the lifetime of a > map. Even fetching the data can cause enough load for so many failures to > occur. > We believe that number of tasks and size of cluster should be taken into > account. Based on which we believe that a ratio between total fetch attempts > and total failed attempts should be taken into consideration. > Given our experience with a task should be declared "Too many fetch failures" > based on: > failures > n /*could be 3*/ && (failures/total attempts) > k% /*could be > 30-40%*/ > Basically the first factor is to give some headstart to the second factor, > second factor then takes into account the cluster size and the task size. > Additionally we could take recency into account, say failures and attempts in > last one hour. We do not want to make it too small. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.