[
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xianyin Xin reopened MAPREDUCE-6485:
------------------------------------
> MR job hanged forever because all resources are taken up by reducers and the
> last map attempt never get resource to run
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster
> Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
> Reporter: Bob
> Priority: Critical
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces
> will take resource and start to run when all the map have not finished.
> But It could happened that when all the resources are taken up by running
> reduces, there is still one map not finished.
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout),
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP, so failed
> map attempt would not be started.
> As for the second attempt which was started due to having enable map task
> speculative is pending at UNASSINGED state because of no resource available.
> But the second map attempt request have lower priority than reduces, so
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left.
> and the last map hanged there because of no resource available. so, the job
> would never finish.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)