[jira] [Commented] (MAPREDUCE-4474) TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage
[ https://issues.apache.org/jira/browse/MAPREDUCE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431607#comment-13431607 ] Ilya Katsov commented on MAPREDUCE-4474: This patch is for 0.23. It can not be applied to the trunk. TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage --- Key: MAPREDUCE-4474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4474 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Environment: CentOS 6 Reporter: Ilya Katsov Labels: test Attachments: MAPREDUCE-4474-branch-0.23.patch TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage: {code} 2012-07-24 04:50:46,563 INFO [AsyncDispatcher event handler] rmapp.RMAppImpl (RMAppImpl.java:transition(559)) - Application application_1343091034814_0001 failed 1 times due to AM Container for appattempt_1343091034814_0001_01 exited with exitCode: 143 due to: Container [pid=6146,containerID=container_1343091034814_0001_01_01] is running beyond virtual memory limits. Current usage: 82.4mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container. Dump of the process-tree for container_1343091034814_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 6146 5773 6146 6146 (bash) 2 0 108613632 340 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 -- {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4535) Test failures with Container .. is running beyond virtual memory limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Katsov updated MAPREDUCE-4535: --- Attachment: MAPREDUCE-4535-branch-0.23.patch Test failures with Container .. is running beyond virtual memory limits - Key: MAPREDUCE-4535 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4535 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Reporter: Ilya Katsov Attachments: MAPREDUCE-4535-branch-0.23.patch Tests org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces} fail with the following message: {code} Container [pid=7785,containerID=container_1342495768864_0001_01_01] is running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container. Dump of the process-tree for container_1342495768864_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster {code} This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX resolves the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4534) Test failures with Container .. is running beyond virtual memory limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431615#comment-13431615 ] Ilya Katsov commented on MAPREDUCE-4534: Accidentially created duplicate of MAPREDUCE-4533. Must be closed. Test failures with Container .. is running beyond virtual memory limits - Key: MAPREDUCE-4534 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4534 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Reporter: Ilya Katsov Tests org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces} fail with the following message: {code} Container [pid=7785,containerID=container_1342495768864_0001_01_01] is running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container. Dump of the process-tree for container_1342495768864_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster |- 7785 7101 7785 7785 (bash) 1 1 108605440 332 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stdout 2/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stderr {code} This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX resolves the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4533) Test failures with Container .. is running beyond virtual memory limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431614#comment-13431614 ] Ilya Katsov commented on MAPREDUCE-4533: Accidentially created duplicate of MAPREDUCE-4533. Must be closed. Test failures with Container .. is running beyond virtual memory limits - Key: MAPREDUCE-4533 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4533 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Reporter: Ilya Katsov Tests org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces} fail with the following message: {code} Container [pid=7785,containerID=container_1342495768864_0001_01_01] is running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container. Dump of the process-tree for container_1342495768864_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster |- 7785 7101 7785 7785 (bash) 1 1 108605440 332 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stdout 2/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stderr {code} This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX resolves the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431639#comment-13431639 ] Ahmed Radwan commented on MAPREDUCE-4469: - Here is a draft patch implementing what I described in my previous comment. Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-4469: Attachment: MAPREDUCE-4469_rev2.patch Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Katsov updated MAPREDUCE-4470: --- Status: Patch Available (was: Open) Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Katsov updated MAPREDUCE-4470: --- Attachment: MAPREDUCE-4470.patch Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431744#comment-13431744 ] Hadoop QA commented on MAPREDUCE-4470: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540006/MAPREDUCE-4470.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2719//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2719//console This message is automatically generated. Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4518) FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation
[ https://issues.apache.org/jira/browse/MAPREDUCE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4518: Attachment: trunk-MR-4518.patch Uploading patch for trunk. I couldn't think of a way to test the patch. Can someone suggest a way to test this? FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation --- Key: MAPREDUCE-4518 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4518 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 1.0.3 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: MR-4518_branch1.patch, trunk-MR-4518.patch In FS, PoolSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand. By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-2454: Status: Open (was: Patch Available) Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.2.0-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, KeyValueIterator.java, MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, MapOutputSorterAbstract.java, ReduceInputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4518) FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation
[ https://issues.apache.org/jira/browse/MAPREDUCE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431891#comment-13431891 ] Hadoop QA commented on MAPREDUCE-4518: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540039/trunk-MR-4518.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2720//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2720//console This message is automatically generated. FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation --- Key: MAPREDUCE-4518 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4518 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 1.0.3 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: MR-4518_branch1.patch, trunk-MR-4518.patch In FS, PoolSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand. By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4538) add Legacy Counter support to getGroupNames
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4538: --- Attachment: MR-4538.txt add Legacy Counter support to getGroupNames --- Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4538.txt Oozie loops through counters using getGroupNames(). This does not include with it legacy counter names, so they get missed, and can result in a backwards compatibility issue in the oozie counter API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4538) add Legacy Counter support to getGroupNames
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431912#comment-13431912 ] Robert Joseph Evans commented on MAPREDUCE-4538: Something seems to be wrong with this JIRA. I cannot mark it as Patch Available. Hopefully JIRA fixes itself soon, or I will refile this. add Legacy Counter support to getGroupNames --- Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4538.txt Oozie loops through counters using getGroupNames(). This does not include with it legacy counter names, so they get missed, and can result in a backwards compatibility issue in the oozie counter API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4538) add Legacy Counter support to getGroupNames
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4538: --- Issue Type: Improvement (was: Bug) add Legacy Counter support to getGroupNames --- Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4538.txt Oozie loops through counters using getGroupNames(). This does not include with it legacy counter names, so they get missed, and can result in a backwards compatibility issue in the oozie counter API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4538) add Legacy Counter support to getGroupNames
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4538: --- Issue Type: Bug (was: Improvement) add Legacy Counter support to getGroupNames --- Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4538.txt Oozie loops through counters using getGroupNames(). This does not include with it legacy counter names, so they get missed, and can result in a backwards compatibility issue in the oozie counter API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4470: Attachment: TestFileInputFormat.java I think a proper fix should address all InputFormat implementations. Tests for empty input should be added for all input formats. For example, I added a test in TestFileInputFormat.java to test for empty input. It is also failing. Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch, TestFileInputFormat.java TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431996#comment-13431996 ] Hadoop QA commented on MAPREDUCE-4470: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540068/TestFileInputFormat.java against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2721//console This message is automatically generated. Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch, TestFileInputFormat.java TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://
[ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432007#comment-13432007 ] Robert Joseph Evans commented on MAPREDUCE-3782: I am +1 too, I'll check this in. teragen terasort jobs fail when using webhdfs:// - Key: MAPREDUCE-3782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Arpit Gupta Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3782.patch When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. And the subsequent terasort job on the output fails with java io exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://
[ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432012#comment-13432012 ] Hudson commented on MAPREDUCE-3782: --- Integrated in Hadoop-Common-trunk-Commit #2567 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2567/]) MAPREDUCE-3782. teragen terasort jobs fail when using webhdfs:// (Jason Lowe via bobby) (Revision 1371325) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1371325 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java teragen terasort jobs fail when using webhdfs:// - Key: MAPREDUCE-3782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Arpit Gupta Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3782.patch When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. And the subsequent terasort job on the output fails with java io exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://
[ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432013#comment-13432013 ] Hudson commented on MAPREDUCE-3782: --- Integrated in Hadoop-Hdfs-trunk-Commit #2632 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2632/]) MAPREDUCE-3782. teragen terasort jobs fail when using webhdfs:// (Jason Lowe via bobby) (Revision 1371325) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1371325 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java teragen terasort jobs fail when using webhdfs:// - Key: MAPREDUCE-3782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Arpit Gupta Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3782.patch When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. And the subsequent terasort job on the output fails with java io exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://
[ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-3782: --- Resolution: Fixed Fix Version/s: 2.2.0-alpha 3.0.0 2.1.0-alpha 0.23.3 Status: Resolved (was: Patch Available) Thanks Jason, I put this into trunk, branch-2, branch-2.1.0-alpha and branch-0.23 teragen terasort jobs fail when using webhdfs:// - Key: MAPREDUCE-3782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Arpit Gupta Assignee: Jason Lowe Priority: Critical Fix For: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha Attachments: MAPREDUCE-3782.patch When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. And the subsequent terasort job on the output fails with java io exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4518) FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation
[ https://issues.apache.org/jira/browse/MAPREDUCE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4518: Status: In Progress (was: Patch Available) FairScheduler: PoolSchedulable#updateDemand() - potential redundant aggregation --- Key: MAPREDUCE-4518 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4518 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 1.0.3 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: MR-4518_branch1.patch, trunk-MR-4518.patch In FS, PoolSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand. By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://
[ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432034#comment-13432034 ] Hudson commented on MAPREDUCE-3782: --- Integrated in Hadoop-Mapreduce-trunk-Commit #2587 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2587/]) MAPREDUCE-3782. teragen terasort jobs fail when using webhdfs:// (Jason Lowe via bobby) (Revision 1371325) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1371325 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java teragen terasort jobs fail when using webhdfs:// - Key: MAPREDUCE-3782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Arpit Gupta Assignee: Jason Lowe Priority: Critical Fix For: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha Attachments: MAPREDUCE-3782.patch When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. And the subsequent terasort job on the output fails with java io exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432088#comment-13432088 ] Mariappan Asokan commented on MAPREDUCE-4470: - Sorry about the file upload. I did not mean it to be a patch:( Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch, TestFileInputFormat.java TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432093#comment-13432093 ] Evert Lammerts commented on MAPREDUCE-4490: --- We ran into this same issue on 0.20.205 - I'll add it is an affected version. JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security) - Key: MAPREDUCE-4490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 Project: Hadoop Map/Reduce Issue Type: Bug Components: task-controller, tasktracker Affects Versions: 1.0.3 Reporter: George Datskos When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks 1) with more map tasks in a job than there are map slots in the cluster will result in immediate task failures for the second task in each JVM (and then the JVM exits). We have investigated this bug and the root cause is as follows. When using LinuxTaskController, the userlog directory for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation (when the JVM is launched) because userlogs directories are created by the task-controller binary which only runs *once* per JVM. Therefore, attempting to create log.index is guaranteed to fail with ENOENT leading to immediate task failure and child JVM exit. {quote} 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) at org.apache.hadoop.mapred.Child.main(Child.java:229) {quote} The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes smoothly. Then Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 is never created so when mapred.Child tries to write the log.index file for Task27, it fails with ENOENT because the attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController because the userlogs directories are created for each task (not just for each JVM as with LinuxTaskController). For each task, the TaskRunner calls the TaskController's createLogDir method before attempting to write out an index file. * DefaultTaskController#createLogDir: creates log directory for each task * LinuxTaskController#createLogDir: does nothing ** task-controller binary creates log directory [create_attempt_directories] (but only for the first task) Possible Solution: add a new command to task-controller *initialize task* to create attempt directories. Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evert Lammerts updated MAPREDUCE-4490: -- Affects Version/s: 0.20.205.0 JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security) - Key: MAPREDUCE-4490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 Project: Hadoop Map/Reduce Issue Type: Bug Components: task-controller, tasktracker Affects Versions: 0.20.205.0, 1.0.3 Reporter: George Datskos When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks 1) with more map tasks in a job than there are map slots in the cluster will result in immediate task failures for the second task in each JVM (and then the JVM exits). We have investigated this bug and the root cause is as follows. When using LinuxTaskController, the userlog directory for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation (when the JVM is launched) because userlogs directories are created by the task-controller binary which only runs *once* per JVM. Therefore, attempting to create log.index is guaranteed to fail with ENOENT leading to immediate task failure and child JVM exit. {quote} 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) at org.apache.hadoop.mapred.Child.main(Child.java:229) {quote} The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes smoothly. Then Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0 is never created so when mapred.Child tries to write the log.index file for Task27, it fails with ENOENT because the attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController because the userlogs directories are created for each task (not just for each JVM as with LinuxTaskController). For each task, the TaskRunner calls the TaskController's createLogDir method before attempting to write out an index file. * DefaultTaskController#createLogDir: creates log directory for each task * LinuxTaskController#createLogDir: does nothing ** task-controller binary creates log directory [create_attempt_directories] (but only for the first task) Possible Solution: add a new command to task-controller *initialize task* to create attempt directories. Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4044) YarnClientProtocolProvider does not honor mapred.job.tracker property
[ https://issues.apache.org/jira/browse/MAPREDUCE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432104#comment-13432104 ] Jason Lowe commented on MAPREDUCE-4044: --- I think this isn't as straightforward as adding a deprecation, primarily because this is a deprecation across configuration files. yarn-default.xml and yarn-site.xml load before mapred-default.xml and mapred-site.xml. If yarn-site.xml sets yarn.resourcemanager.address to an appropriate value, it will be later smashed to local by mapred-default.xml if we tie yarn.resourcemanager.address to mapreduce.jobtracker.address. In addition I think we'd need to update Configuration deprecation support to handle multiple deprecated values mapped to the same new key (i.e.: mapred.job.tracker *and* mapreduce.jobtracker.address would both need to map to yarn.resourcemanager.address). I don't think the deprecation code currently handles a many-to-one mapping, although oddly it appears to support one-to-many. Bottom line is that this change smells pretty risky, certainly not as easy as a one-line Configuration.addDeprecation() call. Would it make more sense from a risk-mitigation standpoint to have Oozie set both mapred.job.tracker and yarn.resourcemanager.address so it can work with both 1.x and 2.x? YarnClientProtocolProvider does not honor mapred.job.tracker property - Key: MAPREDUCE-4044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4044 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.24.0, 0.23.3 Reporter: Alejandro Abdelnur The YarnClientProtocolProvider/YARNRunner/ResourceMgrDelegate bootstrap only looks for 'yarn.resourcemanager.address', they ignore 'mapred.job.tracker' This breaks backward compatibility and creates issues in Oozie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4044) YarnClientProtocolProvider does not honor mapred.job.tracker property
[ https://issues.apache.org/jira/browse/MAPREDUCE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432110#comment-13432110 ] Alejandro Abdelnur commented on MAPREDUCE-4044: --- @jason, what you suggest is exactly what Oozie is currently doing. Agree the deprecation thingy is not that simple. Still the problem impacts apps in general outside of Oozie that set 'mapred.*' values. YarnClientProtocolProvider does not honor mapred.job.tracker property - Key: MAPREDUCE-4044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4044 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.24.0, 0.23.3 Reporter: Alejandro Abdelnur The YarnClientProtocolProvider/YARNRunner/ResourceMgrDelegate bootstrap only looks for 'yarn.resourcemanager.address', they ignore 'mapred.job.tracker' This breaks backward compatibility and creates issues in Oozie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned MAPREDUCE-4367: Assignee: Mayank Bansal mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Mayank Bansal Priority: Minor The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432139#comment-13432139 ] Mayank Bansal commented on MAPREDUCE-4367: -- The issue as reported without HISTORY server up if configured , user can not kill the job. History server does not do anyways in case of kill so in my patch I am short circuiting the History server in case of kill. Adding the test case for testing this scenario in case of History server is up and down. Thanks, Mayank mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Mayank Bansal Priority: Minor Attachments: MAPREDUCE-4367-trunk-v1.patch The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4367: - Attachment: MAPREDUCE-4367-trunk-v1.patch mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Mayank Bansal Priority: Minor Attachments: MAPREDUCE-4367-trunk-v1.patch The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4538) add Legacy Counter support to getGroupNames
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432183#comment-13432183 ] Virag Kothari commented on MAPREDUCE-4538: -- Bobby, MAPREDUCE-4053 will also be fixed by this patch, correct? add Legacy Counter support to getGroupNames --- Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4538.txt Oozie loops through counters using getGroupNames(). This does not include with it legacy counter names, so they get missed, and can result in a backwards compatibility issue in the oozie counter API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4367: - Fix Version/s: trunk Status: Patch Available (was: Open) mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Mayank Bansal Priority: Minor Fix For: trunk Attachments: MAPREDUCE-4367-trunk-v1.patch The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4044) YarnClientProtocolProvider does not honor mapred.job.tracker property
[ https://issues.apache.org/jira/browse/MAPREDUCE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432203#comment-13432203 ] Arun C Murthy commented on MAPREDUCE-4044: -- I'm confused. If someone set mapreduce.framework.name to yarn, why should we support mapred.job.tracker? YarnClientProtocolProvider does not honor mapred.job.tracker property - Key: MAPREDUCE-4044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4044 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.24.0, 0.23.3 Reporter: Alejandro Abdelnur The YarnClientProtocolProvider/YARNRunner/ResourceMgrDelegate bootstrap only looks for 'yarn.resourcemanager.address', they ignore 'mapred.job.tracker' This breaks backward compatibility and creates issues in Oozie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4535) Test failures with Container .. is running beyond virtual memory limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned MAPREDUCE-4535: Assignee: Ilya Katsov Test failures with Container .. is running beyond virtual memory limits - Key: MAPREDUCE-4535 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4535 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Reporter: Ilya Katsov Assignee: Ilya Katsov Attachments: MAPREDUCE-4535-branch-0.23.patch Tests org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces} fail with the following message: {code} Container [pid=7785,containerID=container_1342495768864_0001_01_01] is running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container. Dump of the process-tree for container_1342495768864_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster {code} This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX resolves the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4474) TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage
[ https://issues.apache.org/jira/browse/MAPREDUCE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4474: - Assignee: Ilya Katsov Status: Open (was: Patch Available) TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage --- Key: MAPREDUCE-4474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4474 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Environment: CentOS 6 Reporter: Ilya Katsov Assignee: Ilya Katsov Labels: test Attachments: MAPREDUCE-4474-branch-0.23.patch TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage: {code} 2012-07-24 04:50:46,563 INFO [AsyncDispatcher event handler] rmapp.RMAppImpl (RMAppImpl.java:transition(559)) - Application application_1343091034814_0001 failed 1 times due to AM Container for appattempt_1343091034814_0001_01 exited with exitCode: 143 due to: Container [pid=6146,containerID=container_1343091034814_0001_01_01] is running beyond virtual memory limits. Current usage: 82.4mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container. Dump of the process-tree for container_1343091034814_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 6146 5773 6146 6146 (bash) 2 0 108613632 340 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 -- {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4474) TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage
[ https://issues.apache.org/jira/browse/MAPREDUCE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432216#comment-13432216 ] Arun C Murthy commented on MAPREDUCE-4474: -- Ilya, can u pls rebase your patch after YARN-1? Tx! TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage --- Key: MAPREDUCE-4474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4474 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Environment: CentOS 6 Reporter: Ilya Katsov Assignee: Ilya Katsov Labels: test Attachments: MAPREDUCE-4474-branch-0.23.patch TestDistributedShell.testDSShell fails on CentOS 6 because of high virtual memory usage: {code} 2012-07-24 04:50:46,563 INFO [AsyncDispatcher event handler] rmapp.RMAppImpl (RMAppImpl.java:transition(559)) - Application application_1343091034814_0001 failed 1 times due to AM Container for appattempt_1343091034814_0001_01 exited with exitCode: 143 due to: Container [pid=6146,containerID=container_1343091034814_0001_01_01] is running beyond virtual memory limits. Current usage: 82.4mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container. Dump of the process-tree for container_1343091034814_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 6146 5773 6146 6146 (bash) 2 0 108613632 340 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 -- {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4466) Using URI for yarn.nodemanager log dirs fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4466: -- Fix Version/s: (was: trunk) Status: Open (was: Patch Available) Thanks for the updated patch Mayank. Mostly looks good. The findbugs warning can be avoided by catching individual exceptions instead of a generic catchAll. The unit test has some issues. It refers to absolute paths (file:///target/) - which will break on most systems. Also TestNMWebServers isn't the best place to test this. A simple verification of getContainerLogDirs on a path with and without file:// should be sufficient. Unsetting the Fix Version - that needs to be set only after the change is committed. Using URI for yarn.nodemanager log dirs fails - Key: MAPREDUCE-4466 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4466 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3 Reporter: Eli Collins Assignee: Mayank Bansal Priority: Minor Attachments: MAPREDUCE-4466-trunk-v1.patch, MAPREDUCE-4466-trunk-v2.patch, MAPREDUCE-4466-trunk-v3.patch If I use URIs (eg file:///home/eli/hadoop/dirs) for yarn.nodemanager.log-dirs or yarn.nodemanager.remote-app-log-dir the container log servlet fails with an NPE (works if I remove the file scheme). Using a URI for yarn.nodemanager.local-dirs works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3881) building fail under Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432274#comment-13432274 ] Trevor Robinson commented on MAPREDUCE-3881: This patch fixes the issue for me. Note that it uses tab characters on the newly added line though. building fail under Windows --- Key: MAPREDUCE-3881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3881 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Environment: D:\os\hadoopcommonmvn --version Apache Maven 3.0.4 (r1232337; 2012-01-17 16:44:56+0800) Maven home: C:\portable\maven\bin\.. Java version: 1.7.0_02, vendor: Oracle Corporation Java home: C:\Program Files (x86)\Java\jdk1.7.0_02\jre Default locale: zh_CN, platform encoding: GBK OS name: windows 7, version: 6.1, arch: x86, family: windows Reporter: Changming Sun Priority: Minor Attachments: pom.xml.patch Original Estimate: 1h Remaining Estimate: 1h hadoop-mapreduce-project\hadoop-yarn\hadoop-yarn-common\pom.xml is not portable. execution idgenerate-version/id phasegenerate-sources/phase configuration executablescripts/saveVersion.sh/executable arguments argument${project.version}/argument argument${project.build.directory}/argument /arguments /configuration goals goalexec/goal /goals /execution when I built it under windows , I got a such error: [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (gen erate-version) on project hadoop-yarn-common: Command execution failed. Cannot r un program scripts\saveVersion.sh (in directory D:\os\hadoopcommon\hadoop-map reduce-project\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, ? - [Help 1] we should modify it like this: (copied from hadoop-common-project\hadoop-common\pom.xml) configuration target mkdir dir=${project.build.directory}/generated-sources/java/ exec executable=sh arg line=${basedir}/dev-support/saveVersion.sh ${project.version} ${project.build.directory}/generated-sources/java/ /exec /target /configuration /execution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4491) Encryption and Key Protection
[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated MAPREDUCE-4491: Attachment: Hadoop_Encryption.pdf MR_4491_1.1.patch MR_4491_trunk.patch Attaching the initial patches for trunk and branch-1.1. Please review and let me know the comments. Did minor updates in the design document. One of the test cases in the patch depends on a test class which will be part of another jira (yet to be filed due to the ASF Jira problem) Encryption and Key Protection - Key: MAPREDUCE-4491 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 Project: Hadoop Map/Reduce Issue Type: New Feature Components: documentation, security, task-controller, tasktracker Reporter: Benoy Antony Assignee: Benoy Antony Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf, MR_4491_1.1.patch, MR_4491_trunk.patch When dealing with sensitive data, it is required to keep the data encrypted wherever it is stored. Common use case is to pull encrypted data out of a datasource and store in HDFS for analysis. The keys are stored in an external keystore. The feature adds a customizable framework to integrate different types of keystores, support for Java KeyStore, read keys from keystores, and transport keys from JobClient to Tasks. The feature adds PGP encryption as a codec and additional utilities to perform encryption related steps. The design document is attached. It explains the requirement, design and use cases. Kindly review and comment. Collaboration is very much welcome. I have a tested patch for this for 1.1 and will upload it soon as an initial work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432301#comment-13432301 ] Hadoop QA commented on MAPREDUCE-4367: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540123/MAPREDUCE-4367-trunk-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. -1 javac. The applied patch generated 2071 javac compiler warnings (more than the trunk's current 2070 warnings). +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat org.apache.hadoop.mapreduce.v2.TestYARNRunner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2722//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2722//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2722//console This message is automatically generated. mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Mayank Bansal Priority: Minor Fix For: trunk Attachments: MAPREDUCE-4367-trunk-v1.patch The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4526) Job Credentials are not transmitted if security is turned off
[ https://issues.apache.org/jira/browse/MAPREDUCE-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated MAPREDUCE-4526: Description: Credentials (secret keys) can be passed to a job via mapreduce.job.credentials.json or mapreduce.job.credentials.binary . These credentials get submitted during job submission and are made available to the task processes. In HADOOP 1, these credentials get submitted and routed to task processes even if security was off. In HADOOP 2 , these credentials are transmitted only when the security is turned on. This should be changed for two reasons: 1) It is not backward compatible. 2) Credentials should be passed even if security is turned off . was: Credentials (secret keys) can be passed to a job via mapreduce.job.credentials.json or mapreduce.job.credentials.binary . These credentials get submitted during job submission and are made available to the task processes. In HADOOP 1, these credentials get submitted and routed to task processes even if security was off. In HADOOP 2 , these credentials are transmitted only when the security is turned on. This should be fixed for two reasons: 1) It is not backward compatible. 2) Credentials should be passed even if security is turned off . Job Credentials are not transmitted if security is turned off - Key: MAPREDUCE-4526 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4526 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, security Affects Versions: 2.0.0-alpha Reporter: Benoy Antony Assignee: Benoy Antony Credentials (secret keys) can be passed to a job via mapreduce.job.credentials.json or mapreduce.job.credentials.binary . These credentials get submitted during job submission and are made available to the task processes. In HADOOP 1, these credentials get submitted and routed to task processes even if security was off. In HADOOP 2 , these credentials are transmitted only when the security is turned on. This should be changed for two reasons: 1) It is not backward compatible. 2) Credentials should be passed even if security is turned off . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira