[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

2012-10-04 Thread Rahul Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469511#comment-13469511
 ] 

Rahul Jain commented on MAPREDUCE-4560:
---

Yes, this issue was found in FIFO scheduler; we can mark it duplicate of 
MAPREDUCE-4299 once we verify that fix does resolve the issue.

> Job can get stuck in a deadlock between mappers and reducers for low values 
> of mapreduce.job.reduce.slowstart.completedmaps (<<1)
> -
>
> Key: MAPREDUCE-4560
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Rahul Jain
> Fix For: 2.0.0-alpha
>
>
> This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab 
> systems.
> The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default 
> value).
> We found Application master stuck in a deadlock between mappers and reducers 
> with no progress in the job; the sequence appears to be:
> 1. Initial available map/reduce slots were allocated to mappers
> 2. Once mappers made progress and few of them completed, reducers started 
> occupying few of the slots due to low values of above config param.
> 3. The scheduler appears to not give priority to mappers over reducers; after 
> a while in our system we saw all slots occupied by reducers.
> 4. Since there were still mapper tasks not yet assigned any slot, the map 
> phase never completed.
> 5. The system entered a deadlock state where reducers occupy all available 
> slots, but are waiting for mappers to be complete; mappers cannot move 
> forward because of no slot available.
> The workaround in our system was to set 
> mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer 
> seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

2012-08-16 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435848#comment-13435848
 ] 

nemon lou commented on MAPREDUCE-4560:
--

Do you use the FIFO scheduler?
If so ,have a look at MAPREDUCE-4299

> Job can get stuck in a deadlock between mappers and reducers for low values 
> of mapreduce.job.reduce.slowstart.completedmaps (<<1)
> -
>
> Key: MAPREDUCE-4560
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Rahul Jain
> Fix For: 2.0.0-alpha
>
>
> This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab 
> systems.
> The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default 
> value).
> We found Application master stuck in a deadlock between mappers and reducers 
> with no progress in the job; the sequence appears to be:
> 1. Initial available map/reduce slots were allocated to mappers
> 2. Once mappers made progress and few of them completed, reducers started 
> occupying few of the slots due to low values of above config param.
> 3. The scheduler appears to not give priority to mappers over reducers; after 
> a while in our system we saw all slots occupied by reducers.
> 4. Since there were still mapper tasks not yet assigned any slot, the map 
> phase never completed.
> 5. The system entered a deadlock state where reducers occupy all available 
> slots, but are waiting for mappers to be complete; mappers cannot move 
> forward because of no slot available.
> The workaround in our system was to set 
> mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer 
> seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira