[ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542895 ]
Owen O'Malley commented on HADOOP-2141: --------------------------------------- {quote} There are always going to be such corner cases, and proper speculative execution is the right solution to such problems. {quote} This is exactly backwards. Speculative execution absolutely can *not* be used as a reliability solution. Applications can and do turn it off. Therefore, the system must be completely reliable without speculative execution. Furthermore, if there is a "freezing" problem it may well strike the speculative task also, which would lock the job. Speculative execution is a pure optimization. It is not for reliability. > speculative execution start up condition based on completion time > ----------------------------------------------------------------- > > Key: HADOOP-2141 > URL: https://issues.apache.org/jira/browse/HADOOP-2141 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.15.0 > Reporter: Koji Noguchi > Assignee: Arun C Murthy > Fix For: 0.16.0 > > > We had one job with speculative execution hang. > 4 reduce tasks were stuck with 95% completion because of a bad disk. > Devaraj pointed out > bq . One of the conditions that must be met for launching a speculative > instance of a task is that it must be at least 20% behind the average > progress, and this is not true here. > It would be nice if speculative execution also starts up when tasks stop > making progress. > Devaraj suggested > bq. Maybe, we should introduce a condition for average completion time for > tasks in the speculative execution check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.