[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Resolution: Fixed Fix Version/s: 0.23.6 2.0.3-alpha Status: Resolved (was: Patch Available) Thanks, Mariappan! I committed this to trunk, branch-2, and branch-0.23. Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Fix For: 2.0.3-alpha, 0.23.6 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Attachment: MAPREDUCE-4842-2.patch In the interest of trying to push this forward faster, here's another version of Asokan's patch with the unit test from the original patch added. I also implemented the removeFirst() instead of getFirst() change, and I fixed one more issue. The last patch had a race regarding inProgress where startMerge() could set it to true, but a merge could be completing simultaneously and smash it back to false. Then we'd run a merge without having inProgress as true during the merge, which is Not Good when it comes to getting the fetchers to try to wait when they should. This patch does not implement the pipelining idea yet since the performance tests indicate that it might not be necessary to achieve equivalent performance. Implementing it should be fairly straightforward. For example, we could add a volatile mergeCount field that is incremented when merges complete. waitForMerge() would cache the value in a local on entry and return when either inProgress is false or mergeCount has changed (i.e.: we are waiting for any active merge to complete, not all active merges). Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Assignee: Mariappan Asokan (was: Jason Lowe) Status: Patch Available (was: Open) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch Hi Jason, Thanks for the quick review of the patch. I put the list clearing in a synchronized block. I set {{inProgress}} to {{true}} before starting a merge. I shamelessly:) grabbed your unit test and incorporated in the patch. Please take a look at it. Thanks. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch Made it more robust. Set {{inProgress}} to {{true}} at the end of {{startMerge()}} as well. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch Hi Jason, Thanks for your comments. I think the race condition exists because {{inProgress}} is a {{boolean.}} I changed it to {{AtomicInteger}} and called it {{numPending.}} There should not be any more race condition. Please provide your feedback. Hi Siddharth, I understand your concern on the time it is taking. If we fix this properly, we do not have to come back to this issue later. Jason seems to be reviewing my patch. Thanks. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Mariappan Asokan Priority: Blocker Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch I updated patch. All the changes are in {{MergeManager.}} Here is the outline of changes: * Eliminated the line {code} commitMemory -= size; {code} in {{unreserve()}} method. Rationale: The complementary method {{reserve()}} only increments {{usedMemory}} not {{commitMemory.}} Besides, {{commitMemory}} is used only to decide when we have enough shuffled map outputs in memory to trigger an in-memory merge. * In {{closeInMemoryFile(),}} once an in-memory merge is submitted, {{commitMemory}} is set back to 0. Rationale: If any fetcher thread sneaks in(past the in-memory merge's wait because in-memory merge has not started yet), it will be allowed to shuffle data to memory if memory was freed by the in-memory merger. The value of {{commitMemory}} will be incremented from 0 so that another merge will not be triggered unless the number of bytes of data shuffled by sneaked-in threads is greater than or equal to {{mergeThreshold.}} This will make sure that we do not start a merge prematurely. * Added initialization of {{usedMemory}} and {{commitMemory}} in the constructor(though this is not needed as the java constructor zeros out these by default.) Please test this patch for any performance regression. Thanks. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: Attachment: mapreduce-4842.patch Hi Jason, I have uploaded the patch with a caveat that it was not put to stress test:) You stated the following: {quote} We ran this patch through gridmix, and there are some indications it may negatively affect the performance of shuffle/merge for reducers. Not quite sure why, yet, as I haven't had time to investigate. Maybe since this patch checks for starting merges more often we end up starting merges too early and end up creating more work than if we wait for a fetcher to commit first? {quote} # Did you look at the log files to see the messages logged from {{startMerge()}} method in {{MergeThread}}? It tries to merge at most {{mergeFactor}} map outputs at a time. Do you see any differences in the messages with and without your patch since you are guessing that we end up starting merges too early. # This is a tangent to point 1. The {{mergeFactor}} is set to the configured value for {{IntermediateMemoryToMemoryMerger}} but to Integer.MAX_VALUE for {{InMemoryMerger}} and {{OnDiskMerger.}} We have to find out the rationale behind these choices. # You are right that in my patch I did not make any change to the logic on when to start the merge. Let us compare the logs(with and without the patches) and go from there for any conclusions. Thanks for sharing the information. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Status: Open (was: Patch Available) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Attachment: MAPREDUCE-4842.patch Jason, nice unit test! Thanks! I've modified it a little to have 2 barriers (mergeStart and mergeComplete) rather than use the same 4 times (confused me a lot when I was reviewing it). Other than that, it looks great. +1 Also, if you don't mind, I'll assign the jira to you - since you've done all the heavy lifting and deserve way more credit than I do. Thanks again! Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Attachment: MAPREDUCE-4842.patch Thanks for the reviews, Alejandro and Arun. I updated the patch to address Alejandro's comment and also added a comment clarifying why the merge callback occurs outside of the lock and after inProgress is cleared per a side discussion with Arun. Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Attachment: MAPREDUCE-4842.patch Great catch Jason! Thanks! It seems like we are missing a hook in MergeThread.run to re-check the condition and trigger another merge at the end of the merge itself. Here is an illustrative patch. Thoughts? Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Attachments: MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Priority: Blocker (was: Major) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Affects Version/s: (was: 2.0.3-alpha) 2.0.2-alpha Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Attachment: MAPREDUCE-4842.patch Updated the patch to add a test case and rename checkAndRestartMerge to onSuccessfulMerge Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Assignee: Arun C Murthy Target Version/s: 2.0.3-alpha, 0.23.6 Status: Patch Available (was: Open) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira