[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

Ravi Prakash (JIRA) Tue, 05 Mar 2013 15:52:15 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594110#comment-13594110
 ]


Ravi Prakash commented on MAPREDUCE-4842:
-----------------------------------------

Hi Mariappan,

bq. This is a tangent to point 1. The mergeFactor is set to the configured 
value for IntermediateMemoryToMemoryMerger but to Integer.MAX_VALUE for 
InMemoryMerger and OnDiskMerger. We have to find out the rationale behind these 
choices.

Thanks for all your work on the MergeManager. It is soooooo much cleaner now! 
Thanks much.

Anyway, since you have been in this area of the code, I was wondering if you 
could please review MAPREDUCE-3685? The mergeFactor for the OnDiskMerger was 
wrong. For inMemoryMerger it seems to be correct (because io.sort.factor is 
defined as "The number of streams to merge at once while sorting files. This 
determines the number of open file handles."). Besides I wonder if we want to 
really go into the level of detail of the number of fetched cache lines and not 
just simplify by assuming constant access to all memory. Please consider 
continuing the discussion there.

Thanks


                
> Shuffle race can hang reducer
> -----------------------------
>
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Mariappan Asokan
>            Priority: Blocker
>             Fix For: 2.0.3-alpha, 0.23.6
>
>         Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

Reply via email to