[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469511#comment-13469511 ] Rahul Jain commented on MAPREDUCE-4560: --- Yes, this issue was found in FIFO scheduler; we can mark it duplicate of MAPREDUCE-4299 once we verify that fix does resolve the issue. > Job can get stuck in a deadlock between mappers and reducers for low values > of mapreduce.job.reduce.slowstart.completedmaps (<<1) > - > > Key: MAPREDUCE-4560 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Rahul Jain > Fix For: 2.0.0-alpha > > > This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab > systems. > The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default > value). > We found Application master stuck in a deadlock between mappers and reducers > with no progress in the job; the sequence appears to be: > 1. Initial available map/reduce slots were allocated to mappers > 2. Once mappers made progress and few of them completed, reducers started > occupying few of the slots due to low values of above config param. > 3. The scheduler appears to not give priority to mappers over reducers; after > a while in our system we saw all slots occupied by reducers. > 4. Since there were still mapper tasks not yet assigned any slot, the map > phase never completed. > 5. The system entered a deadlock state where reducers occupy all available > slots, but are waiting for mappers to be complete; mappers cannot move > forward because of no slot available. > The workaround in our system was to set > mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer > seen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435848#comment-13435848 ] nemon lou commented on MAPREDUCE-4560: -- Do you use the FIFO scheduler? If so ,have a look at MAPREDUCE-4299 > Job can get stuck in a deadlock between mappers and reducers for low values > of mapreduce.job.reduce.slowstart.completedmaps (<<1) > - > > Key: MAPREDUCE-4560 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Rahul Jain > Fix For: 2.0.0-alpha > > > This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab > systems. > The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default > value). > We found Application master stuck in a deadlock between mappers and reducers > with no progress in the job; the sequence appears to be: > 1. Initial available map/reduce slots were allocated to mappers > 2. Once mappers made progress and few of them completed, reducers started > occupying few of the slots due to low values of above config param. > 3. The scheduler appears to not give priority to mappers over reducers; after > a while in our system we saw all slots occupied by reducers. > 4. Since there were still mapper tasks not yet assigned any slot, the map > phase never completed. > 5. The system entered a deadlock state where reducers occupy all available > slots, but are waiting for mappers to be complete; mappers cannot move > forward because of no slot available. > The workaround in our system was to set > mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer > seen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira