[jira] [Updated] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy

2014-05-19 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5867:
-

Hadoop Flags: Reviewed

+1, Patch looks good to me.

 Possible NPE in KillAMPreemptionPolicy related to 
 ProportionalCapacityPreemptionPolicy
 --

 Key: MAPREDUCE-5867
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: MapReduce-5867-updated.patch, 
 MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, 
 Yarn-1980.1.patch


 I configured KillAMPreemptionPolicy for My Application Master and tried to 
 check preemption of queues.
 In one scenario I have seen below NPE in my AM
 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267)
   at java.lang.Thread.run(Thread.java:662)
 I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy

2014-05-19 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5867:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~sunilg] 

 Possible NPE in KillAMPreemptionPolicy related to 
 ProportionalCapacityPreemptionPolicy
 --

 Key: MAPREDUCE-5867
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 3.0.0

 Attachments: MapReduce-5867-updated.patch, 
 MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, 
 Yarn-1980.1.patch


 I configured KillAMPreemptionPolicy for My Application Master and tried to 
 check preemption of queues.
 In one scenario I have seen below NPE in my AM
 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267)
   at java.lang.Thread.run(Thread.java:662)
 I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2014-05-19 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Status: Patch Available  (was: Open)

 JVM reuse is incompatible with LinuxTaskController (and therefore 
 incompatible with Security)
 -

 Key: MAPREDUCE-4490
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller, tasktracker
Affects Versions: 1.2.1, 1.0.3, 0.20.205.0
Reporter: George Datskos
Assignee: sam liu
Priority: Critical
  Labels: patch
 Fix For: 1.2.1

 Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch, 
 MAPREDUCE-4490.patch


 When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks  
 1) with more map tasks in a job than there are map slots in the cluster will 
 result in immediate task failures for the second task in each JVM (and then 
 the JVM exits). We have investigated this bug and the root cause is as 
 follows. When using LinuxTaskController, the userlog directory for a task 
 attempt (../userlogs/job/task-attempt) is created only on the first 
 invocation (when the JVM is launched) because userlogs directories are 
 created by the task-controller binary which only runs *once* per JVM. 
 Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
 leading to immediate task failure and child JVM exit.
 {quote}
 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
 logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
 as that of the first task 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 ENOENT: No such file or directory
 at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
 at 
 org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
 at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
 at org.apache.hadoop.mapred.Child.main(Child.java:229)
 {quote}
 The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
 smoothly. Then Task27 starts. The directory 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
  is never created so when mapred.Child tries to write the log.index file for 
 Task27, it fails with ENOENT because the 
 attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
 the second task in each JVM is guaranteed to fail (and then the JVM exits) 
 every time when using LinuxTaskController. Note that this problem does not 
 occur when using the DefaultTaskController because the userlogs directories 
 are created for each task (not just for each JVM as with LinuxTaskController).
 For each task, the TaskRunner calls the TaskController's createLogDir method 
 before attempting to write out an index file.
 * DefaultTaskController#createLogDir: creates log directory for each task
 * LinuxTaskController#createLogDir: does nothing
 ** task-controller binary creates log directory [create_attempt_directories] 
 (but only for the first task)
 Possible Solution: add a new command to task-controller *initialize task* to 
 create attempt directories.  Call that command, with ShellCommandExecutor, in 
 the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2014-05-19 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5044:
-

Attachment: MAPREDUCE-5044.v06.patch

v06 that utilizes signalContainer API provided by YARN-1515.v08.patch

 Have AM trigger jstack on task attempts that timeout before killing them
 

 Key: MAPREDUCE-5044
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
 MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
 MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen 
 Shot 2013-11-12 at 1.06.04 PM.png


 When an AM expires a task attempt it would be nice if it triggered a jstack 
 output via SIGQUIT before killing the task attempt.  This would be invaluable 
 for helping users debug their hung tasks, especially if they do not have 
 shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-05-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001941#comment-14001941
 ] 

Jason Lowe commented on MAPREDUCE-5844:
---

Yes, the always-full-but-churning queue scenario can lead to an unfortunate 
preemption case, because the RM and AM can't know cluster resources will free 
up imminently.  I think it's reasonable to add a configurable delay before 
preempting a reducer when a map retroactively fails.  That way users can tune 
based on their cluster usage -- if they have a heavily used, high-churn cluster 
they can tune the parameter based on the average time for a container to free 
up in their queue.  Other users who rarely run in this mode or for which the 
delay would be intolerable can set this to zero to avoid unnecessary delays in 
processing fetch-failed maps when there is no current headroom.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-05-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002007#comment-14002007
 ] 

Karthik Kambatla commented on MAPREDUCE-5844:
-

And, we can start with a default value of zero until we see clear evidence that 
having a higher number doesn't hurt.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002096#comment-14002096
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5309:


Haven't looked very carefully, but scanned through the patch and it seems 
reasonable. Can you post a summary of what the patch does for posterity? Tx.

 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, 
 job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Test code and the job history file are attached.
 Test code:
 package com.twitter.somepackagel;
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo;
 import org.junit.Test;
 import 

[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-19 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-5309:
--

Status: Open  (was: Patch Available)

Current patch has a typo in one of the log statements.

 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, 
 job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Test code and the job history file are attached.
 Test code:
 package com.twitter.somepackagel;
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo;
 import org.junit.Test;
 import org.apache.hadoop.yarn.YarnException;
 public class Test20JobHistoryParsing {

   @Test
   public void testFileAvro() throws IOException
   {
   Path 

[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-19 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-5309:
--

Attachment: MAPREDUCE-5309-v5.patch

Initially EventReader#reader was initialized like:
this.reader = new SpecificDatumReader(schema, schema);
This assumed the reader schema and writer schema is the same.
But when the schema was upgraded from 2.0.3 to 2.0.4, new fields were added in 
2.0.4 which were not present in 2.0.3. When the parser tried to parse 2.0.3 
logs (which doesn't have the new fields), the parser returned with errors.
So basically we need to differentiate between the new schema and the schema of 
the input jhist files and avro will do the rest of the mapping by field name.
For the fields that were recently added, we need to assign the default values. 
So in case if we are parsing the old schema jhist files, it will assign the 
default value.
[~vinodkv]: I hope this helps.
[~viraj]: Yes, this patch will parse both 0.23.x and 2.4.x logs.

 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, 
 Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 

[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-19 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-5309:
--

Status: Patch Available  (was: Open)

 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, 
 Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Test code and the job history file are attached.
 Test code:
 package com.twitter.somepackagel;
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser;
 import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo;
 import org.junit.Test;
 import org.apache.hadoop.yarn.YarnException;
 public class Test20JobHistoryParsing {

   @Test
   public void testFileAvro() throws IOException
   {
   Path local_path2 = new 

[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy

2014-05-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002373#comment-14002373
 ] 

Andrew Wang commented on MAPREDUCE-5867:


Hey Devraj, I think TestKillAMPreemptionPolicy.java was committed with CRLFs 
rather than LFs, which messes up {{git diff}} for those of us using the git 
mirror. Do you mind fixing this? Thanks.

 Possible NPE in KillAMPreemptionPolicy related to 
 ProportionalCapacityPreemptionPolicy
 --

 Key: MAPREDUCE-5867
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 3.0.0

 Attachments: MapReduce-5867-updated.patch, 
 MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, 
 Yarn-1980.1.patch


 I configured KillAMPreemptionPolicy for My Application Master and tried to 
 check preemption of queues.
 In one scenario I have seen below NPE in my AM
 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267)
   at java.lang.Thread.run(Thread.java:662)
 I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy

2014-05-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002412#comment-14002412
 ] 

Andrew Wang commented on MAPREDUCE-5867:


Actually, nevermind, I fixed it myself. I learned something new about SVN, 
apparently we should be doing svn propset svn:eol-style native file on new 
files (thanks cmccabe for the tip). I ran {{dos2unix}} to convert the newlines 
too.

 Possible NPE in KillAMPreemptionPolicy related to 
 ProportionalCapacityPreemptionPolicy
 --

 Key: MAPREDUCE-5867
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 3.0.0

 Attachments: MapReduce-5867-updated.patch, 
 MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, 
 Yarn-1980.1.patch


 I configured KillAMPreemptionPolicy for My Application Master and tried to 
 check preemption of queues.
 In one scenario I have seen below NPE in my AM
 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267)
   at java.lang.Thread.run(Thread.java:662)
 I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5790) Default map hprof profile options do not work

2014-05-19 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002782#comment-14002782
 ] 

Ming Ma commented on MAPREDUCE-5790:


It appears https://issues.apache.org/jira/browse/MAPREDUCE-5650 set  

namemapreduce.task.profile.map.params/name
value${mapreduce.task.profile.params}/value

The reading code doesn't like it.

If you remove the default setting, things will work fine. Perhaps we can leave 
the default value empty so profile works out of box.

 Default map hprof profile options do not work
 -

 Key: MAPREDUCE-5790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
 Environment: java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Andrew Wang

 I have an MR job doing the following:
 {code}
 Job job = Job.getInstance(conf);
 // Enable profiling
 job.setProfileEnabled(true);
 job.setProfileTaskRange(true, 0);
 job.setProfileTaskRange(false, 0);
 {code}
 When I run this job, some of my map tasks fail with this error message:
 {noformat}
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh:
  line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN   -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41
  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 
 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 
 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout
  
 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr
  : bad substitution
 {noformat}
 It looks like ${mapreduce.task.profile.params} is not getting subbed in 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy

2014-05-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002802#comment-14002802
 ] 

Devaraj K commented on MAPREDUCE-5867:
--

Thanks [~andrew.wang] for fixing this.

 Possible NPE in KillAMPreemptionPolicy related to 
 ProportionalCapacityPreemptionPolicy
 --

 Key: MAPREDUCE-5867
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 3.0.0

 Attachments: MapReduce-5867-updated.patch, 
 MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, 
 Yarn-1980.1.patch


 I configured KillAMPreemptionPolicy for My Application Master and tried to 
 check preemption of queues.
 In one scenario I have seen below NPE in my AM
 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267)
   at java.lang.Thread.run(Thread.java:662)
 I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)