[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535094#comment-13535094 ]
Mariappan Asokan commented on MAPREDUCE-4842: --------------------------------------------- Hi Jason, Thomas, and Siddharth, Thanks for running the tests and reporting your findings. My patch was intended to eliminate the race condition due to the {{isInProgress()}} method in {{MergeThread.}} One cannot check the state of a thread and then take an action based on the state because the state might change before the action is taken. The state checking and action should be atomic. So I came up with a solution to get rid of that method. I was not intending to change the existing logic on when an in-memory merge is triggered. Also, I was not expecting any performance improvement or degradation due to this change. There might be very little improvement in the overall performance due to the elimination of 'synchronized' calls. However, it simplifies the code. Now going to Siddharth's comment: {quote} Asokan, one issue I can see with the patch - while a merge is in progress, every completed fetch will end up generating a single element list for the merger - effectively getting written out to it's own file. {quote} You are right that such a scenario is possible. However, the fetcher thread will be waiting in {{waitForInMemoryMerge()}} or it may get stalled map output. This may mitigate the problem. I have an idea on how to eliminate this problem completely. I will verify that it will work and post it as part of the patch later. It will be simple, I promise:) Siddharth, you state: {quote} Also, there's a couple of exceptions from MergeThread.run during shutdown, which would need to be addressed, if this approach is being taken. {quote} Can you describe a scenario when this might be a problem? We can address that too. Once again, thanks to all of you. -- Asokan > Shuffle race can hang reducer > ----------------------------- > > Key: MAPREDUCE-4842 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.0.2-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Blocker > Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, > MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch > > > Saw an instance where the shuffle caused multiple reducers in a job to hang. > It looked similar to the problem described in MAPREDUCE-3721, where the > fetchers were all being told to WAIT by the MergeManager but no merge was > taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira