[jira] [Commented] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793222#comment-13793222 ] Hadoop QA commented on MAPREDUCE-4490: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607976/MAPREDUCE-4490.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4113//console This message is automatically generated. > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > - > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker >Affects Versions: 0.20.205.0, 1.0.3, 1.2.1 >Reporter: George Datskos >Assignee: sam liu >Priority: Critical > Labels: patch > Fix For: 1.2.1 > > Attachments: MAPREDUCE-4490.patch > > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated MAPREDUCE-4490: --- Fix Version/s: 1.2.1 Labels: patch (was: ) Target Version/s: 1.2.1 Affects Version/s: 1.2.1 Status: Patch Available (was: Open) As above comments/description, the root cause of this issue is that userlogs directories are created by the task-controller binary which only runs once per JVM when using LinuxTaskController. So the major purpose of the patch is to add a new command to task-controller initialize task to create attempt directories and invoke it, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method. Below are the main details of the modifications: 1. src/c++/task-controller/impl/task-controller.h: Add declaration to new method initialize_task() 2. src/c++/task-controller/impl/task-controller.c: Implement the new method initialize_task() which invokes existing method create_attempt_directories() 3. src/c++/task-controller/impl/main.c: To allow to invoke new method initialize_task() from ShellCommandExecutor 4. src/mapred/org/apache/hadoop/mapred/LinuxTaskController.java: In method createLogDir() to invoke initialize_task() from ShellCommandExecutor to create attempt directory before launching each task > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > - > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker >Affects Versions: 1.2.1, 1.0.3, 0.20.205.0 >Reporter: George Datskos >Assignee: sam liu >Priority: Critical > Labels: patch > Fix For: 1.2.1 > > Attachments: MAPREDUCE-4490.patch > > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated MAPREDUCE-4490: --- Priority: Critical (was: Major) > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > - > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker >Affects Versions: 0.20.205.0, 1.0.3 >Reporter: George Datskos >Assignee: sam liu >Priority: Critical > Attachments: MAPREDUCE-4490.patch > > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5580) OutOfMemoryError in ReduceTask shuffleInMemory
Kevin Beyer created MAPREDUCE-5580: -- Summary: OutOfMemoryError in ReduceTask shuffleInMemory Key: MAPREDUCE-5580 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5580 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.2 Reporter: Kevin Beyer I have had several reduce tasks fail during the shuffle phase with the following error and stack trace (on CHD 4.1.2): Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1644) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1504) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1339) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1271) I found many web posts that report the same problem and a prior hadoop issue that is already fixed (that one involved a int overflow problem). The task had 1 GB of java heap and the mapred.job.shuffle.input.buffer.percent parameter in mapred-site.xml was set to the default of 0.7. This mean that 1 GB * 0.7 = 717 MB of java heap will hold the map outputs that are no bigger than 717 / 4 = 179 MB. We were able to capture a heap dump of one reduce task. The heap contained 8 byte arrays that were 127 MB each. These byte arrays were all referenced by their own DataInputBuffer. Six of the buffers were referenced by the linked lists in ReduceTask$ReduceCopier.mapOutputsFilesInMemory. These six byte arrays consume 127 MB * 6 = 762 MB of the heap. Curiously, this 762 MB exceeds the 717 MB limit. The ShuffleRamManager.fullSize = 797966777 = 761MB, so something is a bit off in my original value of 717... But this is not the major source of trouble. There are two more large byte arrays of 127 MB * 2 = 254 MB that are still in memory. These are referenced from DataInputBuffers that are referenced indirectly by the static Merger.MergeQueue instance. One of these is referenced twice by the 'key' and 'value' fields of the MergeQueue. These fields store the current minimum key and value by pointing at the full byte array of the map output and a range of a few bytes in that array. These fields are needed during the active merge process, but not needed when the merge is complete. In my heap dump, the 'segments' list has been cleared, so no active merge is in progress. However, the 'key' and 'value' are still set from the last merge pass. This pins one in-memory map output in memory, which can be as big as 0.7 / 4 = 17.5% of memory with default settings. When a merge phase is complete, these two fields should be set null. The second byte array is referenced via the MergeQueue.comparator RawComparator. In my case, this is a WritableComparator. This is most likely caused by this method: public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { try { buffer.reset(b1, s1, l1); // parse key1 key1.readFields(buffer); buffer.reset(b2, s2, l2); // parse key2 key2.readFields(buffer); } catch (IOException e) { throw new RuntimeException(e); } return compare(key1, key2); // compare them } This causes the comparator to remember the last 'b2' byte array passed into compare(). This byte array could be an in-memory map output, which by default is 0.7/4 = 17.5% of memory. This code could have a finally { buffer.clear() } to drop the reference. Alternatively, the API could include a reset() call to clear such unnecessary state. Given this information, we can see why we can easily cause an OOM error: By default we have 70% of ram dedicated to map output, and we can have 17.5 * 2 = 35% of memory unaccounted for by the two referenced described. Even without accounting for any other memory overhead, we already have 70% + 35% = 105% of ram occupied in the unlucky case that these two references are pointing at the largest possible in-memory map outputs. There may be other leakage of these byte arrays, but these were all the large byte arrays in my heap dump. A test that makes many map outputs that are 0.7 / 4 = 17.5% of the reduce task heap can reliably recreate this problem and perhaps find other unaccounted large byte arrays. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Moved] (MAPREDUCE-5579) Improve JobTracker web UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers moved HADOOP-10038 to MAPREDUCE-5579: Affects Version/s: (was: 1.2.2) 1.2.2 Key: MAPREDUCE-5579 (was: HADOOP-10038) Project: Hadoop Map/Reduce (was: Hadoop Common) > Improve JobTracker web UI > - > > Key: MAPREDUCE-5579 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5579 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.2 >Reporter: David Chen > Attachments: jobdetails.png, jobtasks.png, jobtracker.png > > > Users will often need to use the JobTracker web UI to debug or tune their > jobs in addition to checking the status of their jobs. The current web UI is > cumbersome to navigate. The goal is to make the JobTracker web UI easier to > navigate and present the data in a cleaner and more intuitive format. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793184#comment-13793184 ] Hadoop QA commented on MAPREDUCE-3860: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608100/MAPREDUCE-3860.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1525 javac compiler warnings (more than the trunk's current 1524 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 16 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-rumen hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapred.TestClusterMapReduceTestCase The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-rumen hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.conf.TestNoDefaultsJobConf The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-tools/hadoop-rumen {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//console This message is automatically generated. > [Rumen] Bring back the removed Rumen unit tests > --- > > Key: MAPREDUCE-3860 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Andrey Klochkov > Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz > > > MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder > and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need > to be brought back: > TestZombieJob.java > TestRumenJobTraces.java > TestRumenFolder.java > TestRumenAnonymization.java > TestParsedLine.java > TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-3860: --- Target Version/s: 3.0.0, 2.3.0 Status: Patch Available (was: Open) > [Rumen] Bring back the removed Rumen unit teststoo > -- > > Key: MAPREDUCE-3860 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Andrey Klochkov > Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz > > > MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder > and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need > to be brought back: > TestZombieJob.java > TestRumenJobTraces.java > TestRumenFolder.java > TestRumenAnonymization.java > TestParsedLine.java > TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-3860: --- Summary: [Rumen] Bring back the removed Rumen unit tests (was: [Rumen] Bring back the removed Rumen unit teststoo) > [Rumen] Bring back the removed Rumen unit tests > --- > > Key: MAPREDUCE-3860 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Andrey Klochkov > Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz > > > MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder > and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need > to be brought back: > TestZombieJob.java > TestRumenJobTraces.java > TestRumenFolder.java > TestRumenAnonymization.java > TestParsedLine.java > TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-3860: --- Attachment: MAPREDUCE-3860.patch rumen-test-data.tar.gz Attaching a patch and a tarball with gzip'ped test data. The robot wouldn't be able to run tests. > [Rumen] Bring back the removed Rumen unit teststoo > -- > > Key: MAPREDUCE-3860 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Andrey Klochkov > Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz > > > MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder > and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need > to be brought back: > TestZombieJob.java > TestRumenJobTraces.java > TestRumenFolder.java > TestRumenAnonymization.java > TestParsedLine.java > TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5387) Implement Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793117#comment-13793117 ] Andrey Klochkov commented on MAPREDUCE-5387: Indeed, [YARN-445] is related. Thanks to [~cnauroth] for pointing. I think I can put up a patch which sends Ctrl+C to all processes in the job object and make Yarn use it as an analog to TERM signal when running on Windows. That would be similar to how it's done with Ctrl+Break in [YARN-445]. > Implement Signal.TERM on Windows > > > Key: MAPREDUCE-5387 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 3.0.0, 1-win, 2.1.0-beta >Reporter: Ivan Mitic >Assignee: Ivan Mitic > > Signal.TERM is currently not supported by Hadoop on the Windows platform. > Tracking Jira for the problem. > A couple of things to keep in mind: > - Support for process groups (JobObjects on Windows) > - Solution should work for both java and other streaming Hadoop apps -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793046#comment-13793046 ] Sandy Ryza commented on MAPREDUCE-5517: --- The patch looks good to me other than a few stylistic nits: {code} +|| (numReduceTasks == 0 && conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0) <= sysMemSizeForUberSlot)); {code} This line looks like it's over 80 characters. {code} - + {code} False whitespace change {code} + +// enable uber mode of 0 reducer no matter how much memory we assign to reducer +conf = new Configuration(); + conf.setBoolean(MRJobConfig.JOB_UBERTASK_ENABLE, true); + conf.setInt(MRJobConfig.NUM_REDUCES, 0); //actual num of reducer set to 0 + conf.setInt(MRJobConfig.REDUCE_MEMORY_MB, 2048); //mapreduce.reduce.memory.mb set to 2048 MB which is larger than yarn.app.mapreduce.am.resource.mb(1536 MB by default) + isUber = testUberDecision(conf); + Assert.assertTrue(isUber); {code} Spaces should be used instead of tabs > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v3.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov reassigned MAPREDUCE-3860: -- Assignee: Andrey Klochkov > [Rumen] Bring back the removed Rumen unit teststoo > -- > > Key: MAPREDUCE-3860 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Andrey Klochkov > > MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder > and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need > to be brought back: > TestZombieJob.java > TestRumenJobTraces.java > TestRumenFolder.java > TestRumenAnonymization.java > TestParsedLine.java > TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792827#comment-13792827 ] Mike Roark commented on MAPREDUCE-3859: --- Correction, still an issue in CDH4.2. Fix is the as Sergey's comment for 4.1.2: https://issues.apache.org/jira/browse/MAPREDUCE-3859?focusedCommentId=13659278&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13659278 > CapacityScheduler incorrectly utilizes extra-resources of queue for > high-memory jobs > > > Key: MAPREDUCE-3859 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: capacity-sched >Affects Versions: 1.0.0 >Reporter: Sergey Tryuber >Assignee: Sergey Tryuber > Fix For: 1.2.1 > > Attachments: MAPREDUCE-3859_MR1_fix_and_test.patch.txt, > test-to-fail.patch.txt > > > Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, > jobs which use 3 map slots will never consume more than 9 slots, regardless > how many free slots on a cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792828#comment-13792828 ] Sangjin Lee commented on MAPREDUCE-5186: Raising the priority. The default value of mapreduce.job.max.split.locations effectively renders CombineFileInputFormat DOA on any decent sized clusters. Have others encountered this issue? > mapreduce.job.max.split.locations causes some splits created by > CombineFileInputFormat to fail > -- > > Key: MAPREDUCE-5186 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Priority: Critical > > CombineFileInputFormat can easily create splits that can come from many > different locations (during the last pass of creating "global" splits). > However, we observe that this often runs afoul of the > mapreduce.job.max.split.locations check that's done by JobSplitWriter. > The default value for mapreduce.job.max.split.locations is 10, and with any > decent size cluster, CombineFileInputFormat creates splits that are well > above this limit. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated MAPREDUCE-5186: --- Priority: Critical (was: Major) > mapreduce.job.max.split.locations causes some splits created by > CombineFileInputFormat to fail > -- > > Key: MAPREDUCE-5186 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Priority: Critical > > CombineFileInputFormat can easily create splits that can come from many > different locations (during the last pass of creating "global" splits). > However, we observe that this often runs afoul of the > mapreduce.job.max.split.locations check that's done by JobSplitWriter. > The default value for mapreduce.job.max.split.locations is 10, and with any > decent size cluster, CombineFileInputFormat creates splits that are well > above this limit. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5541) Improved algorithm for whether need speculative task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792786#comment-13792786 ] Benoy Antony commented on MAPREDUCE-5541: - John , 1. Could you please make these parameters { SPECULATIVE_PROGRESS , SPECULATIVE_FACTOR } configurable ? 2. Could you please share some test results indicating the improvement ? > Improved algorithm for whether need speculative task > > > Key: MAPREDUCE-5541 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5541 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Affects Versions: 1.2.1 >Reporter: zhaoyunjiong >Assignee: zhaoyunjiong > Fix For: 1.2.2 > > Attachments: MAPREDUCE-5541-branch-1.2.patch > > > Most of time, tasks won't start running at same time. > In this case hasSpeculativeTask in TaskInProgress not working very well. > Some times, some tasks just start running, and scheduler already decide it > need speculative task to run. > And this waste a lot of resource. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5576) MR AM unregistration should be failed due to UnknownHostException on getting history url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792744#comment-13792744 ] Zhijie Shen commented on MAPREDUCE-5576: bq. IIUC, failing the unregistration fails the application. Yes, the current design is to fail the application when unregistration is failed. What I mean is that error of getting the history url should not fail unregistration, because ApplicationMasterProtocol#finishApplicationMaster doesn't require to url to finis an application on RM side. The error of getting the history url may be caused by unavailability of JHS, but also be caused by mis configuration (actually we ran into this issue). JHS HA will reduce the chance of error, but it will still happen. bq. If we decide to still go through with this change, we should probably fail the application early - before running the job and not after. Agree. In fact, when the url is not available, RM already has the logic to log something. Maybe we want to enhance the log. bq. A better approach might to be explicitly log or show that the JHS is down/inaccessible/mis-configured and setting the tracking URL has failed. I don't want to fail the application. As is mentioned above, I don't think it make sense that error of getting the history url should fail unregistration, and ultimately fail the application. > MR AM unregistration should be failed due to UnknownHostException on getting > history url > > > Key: MAPREDUCE-5576 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5576 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Before RMCommunicator sends the request to RM to finish the application, it > will try to get the JHS url, which may throw UnknownHostException. The > current code path will skip sending the request to RM when the exception is > raised, which sounds not a reasonable behavior, because RM's unregistering an > AM will not affected by the tracking URL. The URL can be empty or null. > AFAIK, the impact of null URL will be that the URL to redirect users from RM > web page to JHS will be unavailable, and the job report will not show the URL > as well. However, is it much much better than failing an application because > of UnknownHostException here? Anyway, users can go to JHS directly to find > the application history info. > Therefore, the reasonable code path here should be catching > UnknownHostException and set historyUrl = null -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5576) MR AM unregistration should be failed due to UnknownHostException on getting history url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792645#comment-13792645 ] Karthik Kambatla commented on MAPREDUCE-5576: - IIUC, failing the unregistration fails the application. I am not sure if that is a good thing to do. It means we require the JHS be running to be able to run jobs - even implies that JHS HA is required in addition to RM HA for interruption-less submission of MR jobs. A better approach might to be explicitly log or show that the JHS is down/inaccessible/mis-configured and setting the tracking URL has failed. If we decide to still go through with this change, we should probably fail the application early - before running the job and not after. > MR AM unregistration should be failed due to UnknownHostException on getting > history url > > > Key: MAPREDUCE-5576 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5576 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Before RMCommunicator sends the request to RM to finish the application, it > will try to get the JHS url, which may throw UnknownHostException. The > current code path will skip sending the request to RM when the exception is > raised, which sounds not a reasonable behavior, because RM's unregistering an > AM will not affected by the tracking URL. The URL can be empty or null. > AFAIK, the impact of null URL will be that the URL to redirect users from RM > web page to JHS will be unavailable, and the job report will not show the URL > as well. However, is it much much better than failing an application because > of UnknownHostException here? Anyway, users can go to JHS directly to find > the application history info. > Therefore, the reasonable code path here should be catching > UnknownHostException and set historyUrl = null -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4579) TestTaskAttempt fails jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792534#comment-13792534 ] Hudson commented on MAPREDUCE-4579: --- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/]) svn merge -c 1377943 FIXES: MAPREDUCE-4579. Split TestTaskAttempt into two so as to pass tests on jdk7. Contributed by Thomas Graves (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531047) * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java > TestTaskAttempt fails jdk7 > -- > > Key: MAPREDUCE-4579 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4579 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 3.0.0, 2.0.2-alpha, 0.23.10 > > Attachments: MAPREDUCE-4579.patch > > > --- > Test set: org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt > --- > Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 7.205 sec > <<< > FAILURE!testAttemptContainerRequest(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt) > Time elapsed: 0.032 sec <<< ERROR! > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at org.apache.hadoop.io.Text.readFields(Text.java:280) > at org.apache.hadoop.security.token.Token.readFields(Token.java:165) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792531#comment-13792531 ] Hudson commented on MAPREDUCE-4571: --- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/]) svn merge -c 1457061 FIXES: MAPREDUCE-4571. TestHsWebServicesJobs fails on jdk7. Contributed by Thomas Graves (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531024) * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/MockHistoryJobs.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java > TestHsWebServicesJobs fails on jdk7 > --- > > Key: MAPREDUCE-4571 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: webapps >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.1.0-beta, 0.23.10 > > Attachments: MAPREDUCE-4571.patch > > > TestHsWebServicesJobs fails on jdk7. > Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec > <<< > FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs) > Time elapsed: 0.334 sec <<< FAILURE! > java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1> -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5414) TestTaskAttempt fails jdk7 with NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792532#comment-13792532 ] Hudson commented on MAPREDUCE-5414: --- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/]) svn merge -c 1520964 FIXES: MAPREDUCE-5414. TestTaskAttempt fails in JDK7 with NPE. Contributed by Nemon Lou (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531068) * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java > TestTaskAttempt fails jdk7 with NullPointerException > > > Key: MAPREDUCE-5414 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5414 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 2.0.5-alpha >Reporter: Nemon Lou >Assignee: Nemon Lou > Labels: java7 > Fix For: 0.23.10, 2.1.1-beta > > Attachments: MAPREDUCE-5414.patch, MAPREDUCE-5414.patch > > > Test case org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt fails > once in a while when i run all of them together. > {code:xml} > Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt > Tests run: 9, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 7.893 sec <<< > FAILURE! > Results : > Tests in error: > > testLaunchFailedWhileKilling(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt) > > testContainerCleanedWhileRunning(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt) > > testContainerCleanedWhileCommitting(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt) > > testDoubleTooManyFetchFailure(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt) > Tests run: 9, Failures: 0, Errors: 4, Skipped: 0 > {code} > But if i run a single test case,taking testContainerCleanedWhileRunning for > example,it will fail without doubt. > {code:xml} > classname="org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt" > name="testContainerCleanedWhileRunning"> > type="java.lang.NullPointerException">java.lang.NullPointerException > at org.apache.hadoop.security.token.Token.write(Token.java:216) > at > org.apache.hadoop.mapred.ShuffleHandler.serializeServiceData(ShuffleHandler.java:205) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:695) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createContainerLaunchContext(TaskAttemptImpl.java:751) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1309) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1282) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1009) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testContainerCleanedWhileRunning(TestTaskAttempt.java:410) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) >
[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792529#comment-13792529 ] Hudson commented on MAPREDUCE-4716: --- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/]) svn merge -c 1457065 FIXES: MAPREDUCE-4716. TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7. Contributed by Thomas Graves (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531015) * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobsQuery.java > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.1.0-beta, 0.23.10 > > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792530#comment-13792530 ] Hudson commented on MAPREDUCE-5425: --- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/]) svn merge -c 1511464 FIXES: MAPREDUCE-5425. Junit in TestJobHistoryServer failing in jdk 7. Contributed by Robert Parker (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531029) * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryServer.java > Junit in TestJobHistoryServer failing in jdk 7 > -- > > Key: MAPREDUCE-5425 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.0.4-alpha >Reporter: Ashwin Shankar >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: MAPREDUCE-5425-2.patch, MAPREDUCE-5425-3.patch, > MAPREDUCE-5425.patch > > > We get the following exception when we run the unit tests of > TestJobHistoryServer with jdk 7: > Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719) > at org.apache.hadoop.ipc.Server.bind(Server.java:423) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:535) > at org.apache.hadoop.ipc.Server.(Server.java:2202) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:901) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:505) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746) > at > org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > This is happening because testMainMethod starts the history server and doesnt > stop it. This worked in jdk 6 because tests executed sequentially and this > test was last one and didnt affect other tests,but in jdk 7 it fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated MAPREDUCE-4490: --- Attachment: MAPREDUCE-4490.patch Attached patch works well in my local environment and could resolve current issue. Any feedback is welcome! > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > - > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker >Affects Versions: 0.20.205.0, 1.0.3 >Reporter: George Datskos >Assignee: sam liu > Attachments: MAPREDUCE-4490.patch > > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu reassigned MAPREDUCE-4490: -- Assignee: sam liu > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > - > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker >Affects Versions: 0.20.205.0, 1.0.3 >Reporter: George Datskos >Assignee: sam liu > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5578) Miscellaneous Fair Scheduler speedups
Sandy Ryza created MAPREDUCE-5578: - Summary: Miscellaneous Fair Scheduler speedups Key: MAPREDUCE-5578 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5578 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza I ran the Fair Scheduler's core scheduling loop through a profiler to and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.1#6144)