[jira] [Updated] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5867: - Hadoop Flags: Reviewed +1, Patch looks good to me. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5867: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~sunilg] Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated MAPREDUCE-4490: --- Status: Patch Available (was: Open) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security) - Key: MAPREDUCE-4490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 Project: Hadoop Map/Reduce Issue Type: Bug Components: task-controller, tasktracker Affects Versions: 1.2.1, 1.0.3, 0.20.205.0 Reporter: George Datskos Assignee: sam liu Priority: Critical Labels: patch Fix For: 1.2.1 Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch, MAPREDUCE-4490.patch When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks 1) with more map tasks in a job than there are map slots in the cluster will result in immediate task failures for the second task in each JVM (and then the JVM exits). We have investigated this bug and the root cause is as follows. When using LinuxTaskController, the userlog directory for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation (when the JVM is launched) because userlogs directories are created by the task-controller binary which only runs *once* per JVM. Therefore, attempting to create log.index is guaranteed to fail with ENOENT leading to immediate task failure and child JVM exit. {quote} 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) at org.apache.hadoop.mapred.Child.main(Child.java:229) {quote} The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes smoothly. Then Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 is never created so when mapred.Child tries to write the log.index file for Task27, it fails with ENOENT because the attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController because the userlogs directories are created for each task (not just for each JVM as with LinuxTaskController). For each task, the TaskRunner calls the TaskController's createLogDir method before attempting to write out an index file. * DefaultTaskController#createLogDir: creates log directory for each task * LinuxTaskController#createLogDir: does nothing ** task-controller binary creates log directory [create_attempt_directories] (but only for the first task) Possible Solution: add a new command to task-controller *initialize task* to create attempt directories. Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
[ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-5044: - Attachment: MAPREDUCE-5044.v06.patch v06 that utilizes signalContainer API provided by YARN-1515.v08.patch Have AM trigger jstack on task attempts that timeout before killing them Key: MAPREDUCE-5044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Gera Shegalov Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt. This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001941#comment-14001941 ] Jason Lowe commented on MAPREDUCE-5844: --- Yes, the always-full-but-churning queue scenario can lead to an unfortunate preemption case, because the RM and AM can't know cluster resources will free up imminently. I think it's reasonable to add a configurable delay before preempting a reducer when a map retroactively fails. That way users can tune based on their cluster usage -- if they have a heavily used, high-churn cluster they can tune the parameter based on the average time for a container to free up in their queue. Other users who rarely run in this mode or for which the delay would be intolerable can set this to zero to avoid unnecessary delays in processing fetch-failed maps when there is no current headroom. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002007#comment-14002007 ] Karthik Kambatla commented on MAPREDUCE-5844: - And, we can start with a default value of zero until we see clear evidence that having a higher number doesn't hurt. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002096#comment-14002096 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5309: Haven't looked very carefully, but scanned through the patch and it seems reasonable. Can you post a summary of what the patch does for posterity? Tx. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Open (was: Patch Available) Current patch has a typo in one of the log statements. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309-v5.patch Initially EventReader#reader was initialized like: this.reader = new SpecificDatumReader(schema, schema); This assumed the reader schema and writer schema is the same. But when the schema was upgraded from 2.0.3 to 2.0.4, new fields were added in 2.0.4 which were not present in 2.0.3. When the parser tried to parse 2.0.3 logs (which doesn't have the new fields), the parser returned with errors. So basically we need to differentiate between the new schema and the schema of the input jhist files and avro will do the rest of the mapping by field name. For the fields that were recently added, we need to assign the default values. So in case if we are parsing the old schema jhist files, it will assign the default value. [~vinodkv]: I hope this helps. [~viraj]: Yes, this patch will parse both 0.23.x and 2.4.x logs. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new
[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002373#comment-14002373 ] Andrew Wang commented on MAPREDUCE-5867: Hey Devraj, I think TestKillAMPreemptionPolicy.java was committed with CRLFs rather than LFs, which messes up {{git diff}} for those of us using the git mirror. Do you mind fixing this? Thanks. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002412#comment-14002412 ] Andrew Wang commented on MAPREDUCE-5867: Actually, nevermind, I fixed it myself. I learned something new about SVN, apparently we should be doing svn propset svn:eol-style native file on new files (thanks cmccabe for the tip). I ran {{dos2unix}} to convert the newlines too. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5790) Default map hprof profile options do not work
[ https://issues.apache.org/jira/browse/MAPREDUCE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002782#comment-14002782 ] Ming Ma commented on MAPREDUCE-5790: It appears https://issues.apache.org/jira/browse/MAPREDUCE-5650 set namemapreduce.task.profile.map.params/name value${mapreduce.task.profile.params}/value The reading code doesn't like it. If you remove the default setting, things will work fine. Perhaps we can leave the default value empty so profile works out of box. Default map hprof profile options do not work - Key: MAPREDUCE-5790 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Environment: java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: Andrew Wang I have an MR job doing the following: {code} Job job = Job.getInstance(conf); // Enable profiling job.setProfileEnabled(true); job.setProfileTaskRange(true, 0); job.setProfileTaskRange(false, 0); {code} When I run this job, some of my map tasks fail with this error message: {noformat} org.apache.hadoop.util.Shell$ExitCodeException: /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh: line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr : bad substitution {noformat} It looks like ${mapreduce.task.profile.params} is not getting subbed in correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002802#comment-14002802 ] Devaraj K commented on MAPREDUCE-5867: -- Thanks [~andrew.wang] for fixing this. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)