[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-21 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

   Resolution: Fixed
Fix Version/s: 0.23.6
   2.0.3-alpha
   Status: Resolved  (was: Patch Available)

Thanks, Mariappan!  I committed this to trunk, branch-2, and branch-0.23.

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Attachment: MAPREDUCE-4842-2.patch

In the interest of trying to push this forward faster, here's another version 
of Asokan's patch with the unit test from the original patch added.  I also 
implemented the removeFirst() instead of getFirst() change, and I fixed one 
more issue.  The last patch had a race regarding inProgress where startMerge() 
could set it to true, but a merge could be completing simultaneously and smash 
it back to false.  Then we'd run a merge without having inProgress as true 
during the merge, which is Not Good when it comes to getting the fetchers to 
try to wait when they should.

This patch does not implement the pipelining idea yet since the performance 
tests indicate that it might not be necessary to achieve equivalent 
performance.  Implementing it should be fairly straightforward.  For example, 
we could add a volatile mergeCount field that is incremented when merges 
complete.  waitForMerge() would cache the value in a local on entry and return 
when either inProgress is false or mergeCount has changed (i.e.: we are waiting 
for any active merge to complete, not all active merges).

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Assignee: Mariappan Asokan  (was: Jason Lowe)
  Status: Patch Available  (was: Open)

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

Hi Jason,
  Thanks for the quick review of the patch.  I put the list clearing in a 
synchronized block.  I set {{inProgress}} to {{true}} before starting a merge.  
I shamelessly:) grabbed your unit test and incorporated in the patch.  Please 
take a look at it.

Thanks.

-- Asokan


 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

Made it more robust.  Set {{inProgress}} to {{true}} at the end of 
{{startMerge()}} as well.

-- Asokan


 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-20 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

Hi Jason,
  Thanks for your comments.  I think the race condition exists because 
{{inProgress}} is a {{boolean.}}  I changed it to {{AtomicInteger}} and called 
it {{numPending.}}  There should not be any more race condition.  Please 
provide your feedback.

Hi Siddharth,
  I understand your concern on the time it is taking.  If we fix this properly, 
we do not have to come back to this issue later.  Jason seems to be reviewing 
my patch.

Thanks.

-- Asokan


 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Mariappan Asokan
Priority: Blocker
 Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
 mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-18 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

I updated patch.  All the changes are in {{MergeManager.}}  Here is the outline 
of changes:
* Eliminated the line
{code}
commitMemory -= size;
{code}
in {{unreserve()}} method.  Rationale: The complementary method {{reserve()}} 
only increments {{usedMemory}} not {{commitMemory.}}  Besides, {{commitMemory}} 
is used only to decide when we have enough shuffled map outputs in memory to 
trigger an in-memory merge.
* In {{closeInMemoryFile(),}} once an in-memory merge is submitted, 
{{commitMemory}} is set back to 0.  Rationale: If any fetcher thread sneaks 
in(past the in-memory merge's wait because in-memory merge has not started 
yet), it will be allowed to shuffle data to memory if memory was freed by the 
in-memory merger.  The value of {{commitMemory}} will be incremented from 0 so 
that another merge will not be triggered unless the number of bytes of data 
shuffled by sneaked-in threads is greater than or equal to {{mergeThreshold.}}  
This will make sure that we do not start a merge prematurely.
* Added initialization of {{usedMemory}} and {{commitMemory}} in the 
constructor(though this is not needed as the java constructor zeros out these 
by default.)

Please test this patch for any performance regression.

Thanks.

-- Asokan


 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: mapreduce-4842.patch, mapreduce-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-06 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4842:


Attachment: mapreduce-4842.patch

Hi Jason,
 I have uploaded the patch with a caveat that it was not put to stress test:)

You stated the following:
{quote}
We ran this patch through gridmix, and there are some indications it may 
negatively affect the performance of shuffle/merge for reducers. Not quite sure 
why, yet, as I haven't had time to investigate. Maybe since this patch checks 
for starting merges more often we end up starting merges too early and end up 
creating more work than if we wait for a fetcher to commit first?
{quote}

# Did you look at the log files to see the messages logged from 
{{startMerge()}} method in {{MergeThread}}? It tries to merge at most 
{{mergeFactor}} map outputs at a time. Do you see any differences in the 
messages with and without your patch since you are guessing that we end up 
starting merges too early.

# This is a tangent to point 1. The {{mergeFactor}} is set to the configured 
value for {{IntermediateMemoryToMemoryMerger}} but to Integer.MAX_VALUE for 
{{InMemoryMerger}} and {{OnDiskMerger.}} We have to find out the rationale 
behind these choices.

# You are right that in my patch I did not make any change to the logic on when 
to start the merge.

Let us compare the logs(with and without the patches) and go from there for any 
conclusions.

Thanks for sharing the information.

-- Asokan

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Status: Open  (was: Patch Available)

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Arun C Murthy
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Attachment: MAPREDUCE-4842.patch

Jason, nice unit test! Thanks!

I've modified it a little to have 2 barriers (mergeStart and mergeComplete) 
rather than use the same 4 times (confused me a lot when I was reviewing it).

Other than that, it looks great. +1

Also, if you don't mind, I'll assign the jira to you - since you've done all 
the heavy lifting and deserve way more credit than I do. Thanks again!

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Arun C Murthy
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Attachment: MAPREDUCE-4842.patch

Thanks for the reviews, Alejandro and Arun.  I updated the patch to address 
Alejandro's comment and also added a comment clarifying why the merge callback 
occurs outside of the lock and after inProgress is cleared per a side 
discussion with Arun.

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
 MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Attachment: MAPREDUCE-4842.patch

Great catch Jason! Thanks!

It seems like we are missing a hook in MergeThread.run to re-check the 
condition and trigger another merge at the end of the merge itself.

Here is an illustrative patch.

Thoughts?

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
 Attachments: MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Priority: Blocker  (was: Major)

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Affects Version/s: (was: 2.0.3-alpha)
   2.0.2-alpha

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-04 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Attachment: MAPREDUCE-4842.patch

Updated the patch to add a test case and rename checkAndRestartMerge to 
onSuccessfulMerge

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-04 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Assignee: Arun C Murthy
Target Version/s: 2.0.3-alpha, 0.23.6
  Status: Patch Available  (was: Open)

 Shuffle race can hang reducer
 -

 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Arun C Murthy
Priority: Blocker
 Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch


 Saw an instance where the shuffle caused multiple reducers in a job to hang.  
 It looked similar to the problem described in MAPREDUCE-3721, where the 
 fetchers were all being told to WAIT by the MergeManager but no merge was 
 taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira