[jira] [Created] (MAPREDUCE-6175) RM failover lead to failure to recognition of application attempt

2014-11-26 Thread Cindy Li (JIRA)
Cindy Li created MAPREDUCE-6175:
---

 Summary: RM failover lead to failure to recognition of application 
attempt
 Key: MAPREDUCE-6175
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6175
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Cindy Li


Seeing this from client side (Hive job): 
2014-11-25 10:00:50,179 Stage-3 map = 100%,  reduce = 99%, Cumulative CPU 
136560.72 sec
2014-11-25 10:00:54,776 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 
42627.6 sec
2014-11-25 10:01:20,097 Stage-3 map = 0%,  reduce = 0%
2014-11-25 10:02:20,348 Stage-3 map = 0%,  reduce = 0%
2014-11-25 10:02:30,702 Stage-3 map = 1%,  reduce = 0%, Cumulative CPU 1511.98 
sec




Seeing this resource manager (rm2):

[14:16]:[hadoop@phxaishdc20en0008-be:logs]# grep 
container_1416845430616_0009_01_01 *
hadoop-hadoop-resourcemanager.log.2014-11-25-10:2014-11-25 10:01:09,757 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: received 
container statuses on node manager register :[ContainerStatus: [ContainerId: 
container_1416845430616_0009_01_01, State: COMPLETE, Diagnostics: , 
ExitStatus: 0, ], ContainerStatus: [ContainerId: 
container_1416845430616_0014_01_000241, State: COMPLETE, Diagnostics: Container 
Killed by ResourceManager

Seeing this in container log (Logs for container_1416845430616_0009_01_01)
2014-11-25 10:59:17,839 INFO [IPC Server handler 2 on 36552] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 10:59:47,905 INFO [IPC Server handler 26 on 36552] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:01,672 INFO [RMCommunicator Allocator] 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
2014-11-25 11:00:17,977 INFO [IPC Server handler 5 on 36552] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:36,882 INFO [RMCommunicator Allocator] 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
2014-11-25 11:00:48,044 INFO [IPC Server handler 2 on 36552] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:52,901 INFO [RMCommunicator Allocator] 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
2014-11-25 11:00:52,912 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error communicating 
with RM: Resource Manager doesn't recognize AttemptId: 
application_1416845430616_0009
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Resource Manager 
doesn't recognize AttemptId: application_1416845430616_0009
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:580)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:220)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:268)
at java.lang.Thread.run(Thread.java:745)
2014-11-25 11:00:52,915 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1416845430616_0009Job 
Transitioned from RUNNING to REBOOT
2014-11-25 11:00:52,916 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
isAMLastRetry: false
2014-11-25 11:00:52,916 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
notified that shouldUnregistered is: false
2014-11-25 11:00:52,916 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: false
2014-11-25 11:00:52,916 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
JobHistoryEventHandler notified that forceJobCompletion is false
2014-11-25 11:00:52,916 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
services
2014-11-25 11:00:52,917 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
JobHistoryEventHandler. Size of the outstanding queue size is 0
2014-11-25 11:00:52,946 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped 
JobHistoryEventHandler. super.stop()
2014-11-25 11:00:52,947 INFO [Thread-3928] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
attempt_1416845430616_0009_r_00_0
2014-11-25 11:00:52,955 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1416845430616_0009_r_00_0 TaskAttempt Transitioned from RUNNING to 
KILLED
2014-11-25 

[jira] [Created] (MAPREDUCE-5563) TestMiniMRChildTask test cases failing on trunk

2013-10-04 Thread Cindy Li (JIRA)
Cindy Li created MAPREDUCE-5563:
---

 Summary:   TestMiniMRChildTask test cases failing on trunk
 Key: MAPREDUCE-5563
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5563
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: trunk
Reporter: Cindy Li
Priority: Critical


Failed tests:
  TestMiniMRChildTask.testTaskTempDir:367 Exception in testing temp dir
  TestMiniMRChildTask.testTaskEnv:390 Exception in testing child env
  TestMiniMRChildTask.testTaskOldEnv:413 Exception in testing child env





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5560) org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler failing on trunk

2013-10-03 Thread Cindy Li (JIRA)
Cindy Li created MAPREDUCE-5560:
---

 Summary: 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler failing on 
trunk
 Key: MAPREDUCE-5560
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5560
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.1.2-beta
Reporter: Cindy Li
Priority: Critical


Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.406 sec  
FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)  
Time elapsed: 0.185 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.junit.Assert.assertNotNull(Assert.java:537)
at 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5561) org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testcase failing on trunk

2013-10-03 Thread Cindy Li (JIRA)
Cindy Li created MAPREDUCE-5561:
---

 Summary: org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl 
testcase failing on trunk
 Key: MAPREDUCE-5561
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5561
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.1.2-beta
Reporter: Cindy Li
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-4710) Add peak memory usage counter for each task

2012-10-05 Thread Cindy Li (JIRA)
Cindy Li created MAPREDUCE-4710:
---

 Summary: Add peak memory usage counter for each task
 Key: MAPREDUCE-4710
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4710
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task
Affects Versions: 1.0.2
Reporter: Cindy Li
Priority: Minor


Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which 
are snapshots of memory usage of that task. They are not sufficient for users 
to understand peak memory usage by that task, e.g. in order to diagnose task 
failures, tune job parameters or change application design. This new feature 
will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and 
VIRTUAL_MEMORY_BYTES_MAX. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira