[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

2011-11-12 Thread Ahmed Radwan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-3343:


Attachment: MAPREDUCE-3343_rev2.patch

Here is zhaoyunjiong's patch incorporating Eli's additional comments.

 TaskTracker Out of Memory because of distributed cache
 --

 Key: MAPREDUCE-3343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ahmed Radwan
Assignee: zhaoyunjiong
  Labels: mapreduce, patch
 Attachments: MAPREDUCE-3343_rev2.patch, 
 mapreduce-3343-release-0.20.205.0.patch


 This Out of Memory happens when you run large number of jobs (using the 
 distributed cache) on a TaskTracker. 
 Seems the basic issue is with the distributedCacheManager (instance of 
 TrackerDistributedCacheManager in TaskTracker.java), this gets created during 
 TaskTracker.initialize(), and it keeps references to 
 TaskDistributedCacheManager for every submitted job via the jobArchives Map, 
 also references to CacheStatus via cachedArchives map. I am not seeing these 
 cleaned up between jobs, so this can out of memory problems after really 
 large number of jobs are submitted. We have seen this issue in a number of 
 cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

2011-11-12 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149009#comment-13149009
 ] 

Hadoop QA commented on MAPREDUCE-3343:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503477/MAPREDUCE-3343_rev2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1297//console

This message is automatically generated.

 TaskTracker Out of Memory because of distributed cache
 --

 Key: MAPREDUCE-3343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ahmed Radwan
Assignee: zhaoyunjiong
  Labels: mapreduce, patch
 Attachments: MAPREDUCE-3343_rev2.patch, 
 mapreduce-3343-release-0.20.205.0.patch


 This Out of Memory happens when you run large number of jobs (using the 
 distributed cache) on a TaskTracker. 
 Seems the basic issue is with the distributedCacheManager (instance of 
 TrackerDistributedCacheManager in TaskTracker.java), this gets created during 
 TaskTracker.initialize(), and it keeps references to 
 TaskDistributedCacheManager for every submitted job via the jobArchives Map, 
 also references to CacheStatus via cachedArchives map. I am not seeing these 
 cleaned up between jobs, so this can out of memory problems after really 
 large number of jobs are submitted. We have seen this issue in a number of 
 cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures

2011-11-12 Thread Thomas Graves (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149072#comment-13149072
 ] 

Thomas Graves commented on MAPREDUCE-3393:
--

yes JAVA_HOME is set.  Did you run them individually or all the tests?  Sorry I 
should have said this originally - they only fail when run them all together. I 
could not get them to fail when run individually. I'll attach logs.

 TestMRJobs, TestMROldApiJobs, and TestUberAM failures
 -

 Key: MAPREDUCE-3393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Hitesh Shah
 Attachments: MR-3393.1.patch, MR-3393.2.patch


 Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.TestClientServiceDelegate
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec
 Running org.apache.hadoop.mapred.TestClientRedirect
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec
 Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider
 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec
 Running org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestYARNRunner
 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec
 Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
 Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
 ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec 
  FAILURE!
 Results :
 Failed tests:   testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job 
 expected to succeed failed
   testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs)
 Tests in error: 
   testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0
   org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start 
 org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 19, Failures: 5, Errors: 2, Skipped: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures

2011-11-12 Thread Thomas Graves (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149075#comment-13149075
 ] 

Thomas Graves commented on MAPREDUCE-3393:
--

TestMROldApiJobs and TestUberAM both fail with exception below.  So perhaps 
something isn't being shut down cleanly in a test before it or in the failure 
of TestMRJobs.  If you can't reproduce it let me know and I'll look at the 
failures.

2011-11-11 17:20:44,452 ERROR [Thread-4] service.CompositeService 
(CompositeService.java:start(72)) - Error starting services ResourceManager
org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem binding 
to [0.0.0.0:8025] java.net.BindException: Address already in use; For more 
details see:  http://wiki.apache.org/hadoop/BindException
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:125)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.start(ResourceTrackerService.java:125)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.start(ResourceManager.java:439)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper$2.run(MiniYARNCluster.java:126)
Caused by: java.net.BindException: Problem binding to [0.0.0.0:8025] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:606)
at org.apache.hadoop.ipc.Server.bind(Server.java:230)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:310)
at org.apache.hadoop.ipc.Server.init(Server.java:1591)
at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:576)
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.init(ProtoOverHadoopRpcEngine.java:314)
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine.getServer(ProtoOverHadoopRpcEngine.java:390)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:155)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:118)
... 5 more

 TestMRJobs, TestMROldApiJobs, and TestUberAM failures
 -

 Key: MAPREDUCE-3393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Hitesh Shah
 Attachments: MR-3393.1.patch, MR-3393.2.patch


 Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.TestClientServiceDelegate
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec
 Running org.apache.hadoop.mapred.TestClientRedirect
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec
 Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider
 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec
 Running org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestYARNRunner
 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec
 Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
 Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
 ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec 
  FAILURE!
 Results :
 Failed tests:   testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job 
 expected to succeed failed
   testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs)
 Tests in error: 
   testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0
   org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start 
 org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 19, Failures: 5, Errors: 2, Skipped: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Updated] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures

2011-11-12 Thread Thomas Graves (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-3393:
-

Attachment: org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt

 TestMRJobs, TestMROldApiJobs, and TestUberAM failures
 -

 Key: MAPREDUCE-3393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Hitesh Shah
 Attachments: MR-3393.1.patch, MR-3393.2.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.TestClientServiceDelegate
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec
 Running org.apache.hadoop.mapred.TestClientRedirect
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec
 Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider
 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec
 Running org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestYARNRunner
 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec
 Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
 Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec 
  FAILURE!
 Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
 ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec 
  FAILURE!
 Results :
 Failed tests:   testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)
   testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job 
 expected to succeed failed
   testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs)
 Tests in error: 
   testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0
   org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start 
 org.apache.hadoop.mapreduce.v2.TestMRJobs
 Tests run: 19, Failures: 5, Errors: 2, Skipped: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml

2011-11-12 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-3395:
---

Attachment: mapreduce-3395-2.patch

Nit-corrected patch. Committing.

 Add mapred.disk.healthChecker.interval to mapred-default.xml
 

 Key: MAPREDUCE-3395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial
 Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch


 Let's add mapred.disk.healthChecker.interval to mapred-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml

2011-11-12 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-3395:
---

Fix Version/s: 0.20.206.0

Committing to 0.20.206. Thanks Eli!

 Add mapred.disk.healthChecker.interval to mapred-default.xml
 

 Key: MAPREDUCE-3395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial
 Fix For: 0.20.206.0

 Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch


 Let's add mapred.disk.healthChecker.interval to mapred-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml

2011-11-12 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-3395.


Resolution: Fixed

 Add mapred.disk.healthChecker.interval to mapred-default.xml
 

 Key: MAPREDUCE-3395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial
 Fix For: 0.20.206.0

 Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch


 Let's add mapred.disk.healthChecker.interval to mapred-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM

2011-11-12 Thread Thomas Graves (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149178#comment-13149178
 ] 

Thomas Graves commented on MAPREDUCE-2863:
--

Hey Hitesh, 

thanks for the feedback. we could easily change the json to match or be closer. 
Right now its configured for POJO output format.  We have a few options: 
http://jersey.java.net/nonav/apidocs/latest/jersey/com/sun/jersey/api/json/JSONConfiguration.Notation.html

Any input on which one people prefer?  The default is mapped when I turn off 
POJO and output is like:

http://virt09-pv1.tgraves.pool.corp.sp2.yahoo.com:8088/ws/v1/cluster/apps
{
   app : {
  finalStatus : UNDEFINED,
  finishedTime : 0,
  progress : 0.0,
  name : word count,
  startedTime : 1321112670525,
  amContainerLogs : 
http://host:/node/containerlogs/container_1321112633248_0001_01_01;,
  elapsedTime : 8681,
  note : , 
  trackingUI : UNASSIGNED,
  state : ACCEPTED,
  appId : application_1321112633248_0001,
  trackingUrl : UNASSIGNED,
  user : tgraves,
  queue : default,
  clusterId : 1321112633248
   }
}
 
?xml version=1.0 encoding=UTF-8 standalone=yes?
apps
  app
appIdapplication_1321112633248_0001/appId
usertgraves/user
nameword count/name
queuedefault/queue
stateACCEPTED/state
progress0.0/progress
trackingUIUNASSIGNED/trackingUI
trackingUrlUNASSIGNED/trackingUrlnote/
finalStatusUNDEFINED/finalStatus
startedTime1321112670525/startedTime
finishedTime0/finishedTimeelapsedTime8717/elapsedTime

amContainerLogshttp://host:/node/containerlogs/container_1321112633248_0001_01_01/amContainerLogs
clusterId1321112633248/clusterId
  /app
/apps


 Support web-services for RM  NM
 

 Key: MAPREDUCE-2863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager, resourcemanager
Reporter: Arun C Murthy
Assignee: Thomas Graves
 Attachments: MAPREDUCE-2863.patch, nmoutput.txt, rmoutput.txt


 It will be very useful for RM and NM to support web-services to export 
 json/xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira