[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537186#comment-13537186
 ] 

Jason Lowe commented on MAPREDUCE-4842:
---------------------------------------

Patch race, sorry about that Asokan.  Took a look at your most recent patch, 
couple of comments:

* I see we're now clearing the lists when certain exceptions are caught, but 
we're not holding a lock on the list when doing so?
* Per my previous comment, I think there is a race regarding inProgress where 
we can do a merge with it set to false.
* Patch will need a unit test, feel free to grab the test from my previous 
patch or roll your own if you have a cleaner one in mind.
                
> Shuffle race can hang reducer
> -----------------------------
>
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Mariappan Asokan
>            Priority: Blocker
>         Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to