[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535094#comment-13535094
 ] 

Mariappan Asokan commented on MAPREDUCE-4842:
---------------------------------------------

Hi Jason, Thomas, and Siddharth,
  Thanks for running the tests and reporting your findings.  My patch was 
intended to eliminate the race condition due to the {{isInProgress()}} method 
in {{MergeThread.}} One cannot check the state of a thread and then take an 
action based on the state because the state might change before the action is 
taken.  The state checking and action should be atomic.  So I came up with a 
solution to get rid of that method.

I was not intending to change the existing logic on when an in-memory merge is 
triggered.  Also, I was not expecting any performance improvement or 
degradation due to this change.  There might be very little improvement in the 
overall performance due to the elimination of 'synchronized' calls.  However, 
it simplifies the code.

Now going to Siddharth's comment:
{quote}
Asokan, one issue I can see with the patch - while a merge is in progress, 
every completed fetch will end up generating a single element list for the 
merger - effectively getting written out to it's own file.
{quote}
You are right that such a scenario is possible.  However, the fetcher thread 
will be waiting in {{waitForInMemoryMerge()}} or it may get stalled map output. 
 This may mitigate the problem.  I have an idea on how to eliminate this 
problem completely.  I will verify that it will work and post it as part of the 
patch later.  It will be simple, I promise:)

Siddharth, you state:
{quote}
Also, there's a couple of exceptions from MergeThread.run during shutdown, 
which would need to be addressed, if this approach is being taken.
{quote}
Can you describe a scenario when this might be a problem?  We can address that 
too.

Once again, thanks to all of you.

-- Asokan

                
> Shuffle race can hang reducer
> -----------------------------
>
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to