[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484036#comment-13484036
 ] 

Hudson commented on MAPREDUCE-4730:
---

Integrated in Hadoop-Yarn-trunk #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/16/])
MAPREDUCE-4730. Fix Reducer's EventFetcher to scale the map-completion 
requests slowly to avoid HADOOP-8942. Contributed by Jason Lowe. (Revision 
1401941)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401941
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestEventFetcher.java


 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484079#comment-13484079
 ] 

Hudson commented on MAPREDUCE-4730:
---

Integrated in Hadoop-Hdfs-0.23-Build #415 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/415/])
MAPREDUCE-4730. Fix Reducer's EventFetcher to scale the map-completion 
requests slowly to avoid HADOOP-8942. Contributed by Jason Lowe.
svn merge --ignore-ancestry -c 1401941 ../../trunk/ (Revision 1401943)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401943
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestEventFetcher.java


 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484095#comment-13484095
 ] 

Hudson commented on MAPREDUCE-4730:
---

Integrated in Hadoop-Hdfs-trunk #1206 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1206/])
MAPREDUCE-4730. Fix Reducer's EventFetcher to scale the map-completion 
requests slowly to avoid HADOOP-8942. Contributed by Jason Lowe. (Revision 
1401941)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401941
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestEventFetcher.java


 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484136#comment-13484136
 ] 

Hudson commented on MAPREDUCE-4730:
---

Integrated in Hadoop-Mapreduce-trunk #1236 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1236/])
MAPREDUCE-4730. Fix Reducer's EventFetcher to scale the map-completion 
requests slowly to avoid HADOOP-8942. Contributed by Jason Lowe. (Revision 
1401941)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401941
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestEventFetcher.java


 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483435#comment-13483435
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


I was thinking that the connection timeouts are unrelated to HADOOP-8942.

You are right, AMScalability only runs maps, so there is no chance to uncover 
this issue.

Are these socket timeout exceptions? I remember running into those with 
gridmix, but never got around to the bottom of that because of more pressing 
concerns.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483454#comment-13483454
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

Yes, these are socket timeout exceptions.  The timeouts are somewhat related to 
HADOOP-8942 in the sense that when the AM heap starts to fill up from buffering 
all those responses, it will spend more time garbage collecting and enough 
garbage collecting leads to unresponsiveness and ultimately timeouts on some of 
the clients.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483622#comment-13483622
 ] 

Hadoop QA commented on MAPREDUCE-4730:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550687/MAPREDUCE-4730.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2965//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2965//console

This message is automatically generated.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483797#comment-13483797
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


Neat test case!

+1, checking this in.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483822#comment-13483822
 ] 

Hudson commented on MAPREDUCE-4730:
---

Integrated in Hadoop-trunk-Commit #2925 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2925/])
MAPREDUCE-4730. Fix Reducer's EventFetcher to scale the map-completion 
requests slowly to avoid HADOOP-8942. Contributed by Jason Lowe. (Revision 
1401941)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401941
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestEventFetcher.java


 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch, 
 MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481862#comment-13481862
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


Patch looks good.

Can you try writing a simple test for EventFetcher? You can mock 
umbilicalProtocol, shuffleScheduler and reporter I suppose. Then you can 
validate your current change also. Let me know if it becomes too cumbersome.

bq. The only issue I ran into was a significant number of maps and reduces 
failed because they timed out trying to establish a connection to the AM.
This is new. I don't remember us running into it when we ran AMScalability. Can 
you file a bug, more details will be great to have.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480289#comment-13480289
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

Update on testing, I was able to test this (along with the fix for 
MAPREDUCE-4733) using a sleep job with 2 maps and 3000 reduces on a cluster 
big enough to mass-launch the map and reduce phases.  The AM with a 1.5GB slot 
size stayed up during the job, where previously it failed even with a larger 
slot.

The only issue I ran into was a significant number of maps and reduces failed 
because they timed out trying to establish a connection to the AM.  I suspected 
the AM could have been busy garbage collecting and causing the delays, so I 
bumped up the AM size to 3G and it ran smoothly with no connection timeout 
failures from any tasks.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479257#comment-13479257
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

A little more digging and I'm a bit more confident that this is a flow control 
problem in the IPC layer.  I think the scenario goes like this:

# 1000's of reducers start asking for map completion events about the same time
# IPC Server.Handler thread fields a call off the queue, makes the call and 
gets 900K of data
# Handler thread queues up the response data to the connection, likely sees its 
the only thing in the queue, and tries to push out the data
# It's too big to send it all without blocking so it pushes the remainder back 
onto the response queue for the Responder thread to deal with and moves on to 
another call from the call queue
# Lots of reducers are queueing up in the call queue to get their 900K of data, 
and the handler threads are processing them and pushing that data on the 
response queues as fast as they can
# Responder thread and/or socket I/O can't keep pace with the rate at which 
handlers are generating 900K responses and we eventually exhaust memory



 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Priority: Blocker

 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479286#comment-13479286
 ] 

Robert Joseph Evans commented on MAPREDUCE-4730:


The patch is simple enough if Jenkins comes back OK I am a +1 on it.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479290#comment-13479290
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


Great analysis! 900K * 3000 reducers = 2.7GB, so the numbers are adding up.

Instead of hard-coding it, each reducer could base it on the total number of 
reducers for the job (from configuration)?

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479291#comment-13479291
 ] 

Hadoop QA commented on MAPREDUCE-4730:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12549736/MAPREDUCE-4730.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2939//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2939//console

This message is automatically generated.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479296#comment-13479296
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


Also, lessening it has performance implications on small jobs (not sure how 
much) given the fetcher loop runs every 1 second irrespective of whether there 
are more events or not. So, hate to propose it, but shall we add in a config to 
override this?

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479311#comment-13479311
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

Is the 1 second sleep necessary?  Seems like we could eliminate that sleep if 
we got a maximum-sized response?

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479432#comment-13479432
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

Filed MAPREDUCE-4733 to track the filtering/windowing issue in 
TaskAttemptListenerImpl.getMapCompletionEvents

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479532#comment-13479532
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4730:


bq. Seems like we could eliminate that sleep if we got a maximum-sized response?
+1.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479633#comment-13479633
 ] 

Hadoop QA commented on MAPREDUCE-4730:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12549805/MAPREDUCE-4730.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
-4 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2945//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2945//console

This message is automatically generated.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch


 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events

2012-10-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478496#comment-13478496
 ] 

Jason Lowe commented on MAPREDUCE-4730:
---

Here's what I have gathered so far from a heap dump of an AM attempt that was 
just about to run out of memory.  Most of the memory was consumed by byte 
buffers, specifically it looked like most of them were RPC response buffers.

I think there might be a flow control issue in the IPC layer that lead to this. 
 More than half of the mappers finished before the first reducer started, and 
thousands of reducers all launched within a few seconds of each other.  They 
all asked the AM for map completion task events, which currently caps the 
response to 1 events per query.  Since more than 1 maps completed 
before the first reducers started, each reducer saw a full event list which 
took around 900K for each response buffer.  There were many IPC Handler threads 
to service the calls, but only one Responder thread to send out the rather 
large response buffers.  I see there's a blocking queue to prevent too many 
calls from coming in at once, but I didn't see any flow control between the 
Handlers and the Responder thread.  It appears that as long as the Handler 
threads can keep up with call queue relatively low, they can continue to queue 
up call response data faster than the Responder thread can send it out.  
Eventually this will exhaust available memory leading to an OOM.

 AM crashes due to OOM while serving up map task completion events
 -

 Key: MAPREDUCE-4730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3
Reporter: Jason Lowe
Priority: Blocker

 We're seeing a repeatable OOM crash in the AM for a task with around 3 
 maps and 3000 reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira