[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548155#comment-14548155 ] Tsuyoshi Ozawa commented on MAPREDUCE-6204: --- [~sam liu], thank you for pinging us. Let me clarify one point: the configurations as follows looks set by your configurations as [~jira.shegalov] mentioned. Is it right? {code} -Xmx1000m -Xms1000m -Xmn100m -Xtune:virtualized -Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/hadoop/tmp,nonFatal -Xscmx20m -Xdump:java:file=/var/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt -Xdump:heap:file=/var/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd {code} TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS --- Key: MAPREDUCE-6204 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: sam liu Assignee: sam liu Priority: Minor Labels: BB2015-05-RFC Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547929#comment-14547929 ] Hadoop QA commented on MAPREDUCE-5965: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 42s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 6m 14s | Tests passed in hadoop-streaming. | | | | 42m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733519/MAPREDUCE-5965.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 363c355 | | hadoop-streaming test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/artifact/patchprocess/testrun_hadoop-streaming.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/console | This message was automatically generated. Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at
[jira] [Assigned] (MAPREDUCE-347) Improve the way error messages are displayed from jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruth Wisniewski reassigned MAPREDUCE-347: - Assignee: Ruth Wisniewski Improve the way error messages are displayed from jobclient --- Key: MAPREDUCE-347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-347 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Peeyush Bishnoi Assignee: Ruth Wisniewski Labels: newbie Today if a job is submitted with an already existing output directory then an exception trace is displayed on the client. A simple message like '{{Error running job as output path already exists}}' might suffice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6368) Unreachable Java code
[ https://issues.apache.org/jira/browse/MAPREDUCE-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547690#comment-14547690 ] Akira AJISAKA commented on MAPREDUCE-6368: -- Thanks [~dhirajnilange] for reporting this issue. I'm thinking the condition can be true. {code} float stepSize = samples.length / (float) numPartitions; int last = -1; for(int i = 1; i numPartitions; ++i) { int k = Math.round(stepSize * i); while (last = k comparator.compare(samples[last], samples[k]) == 0) { ++k; } writer.append(samples[k], nullValue); last = k; } {code} {{k = Math.round(stepSize * i)}} can be equal to {{last = Math.round(stepSize * (i-1))}} if {{stepSize}} is less than 1. Unreachable Java code - Key: MAPREDUCE-6368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6368 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Dhiraj Nilange Priority: Minor Reference Class: org.apache.hadoop.mapreduce.lib.partition.InputSampler Method: writePartitionFile Line: 337 The issue exists in the following code loop at line 337:- while (last = k comparator.compare(samples[last], samples[k]) == 0) { ++k; } The problem is that the first condition in the while loop (last = k) will always be false. The value of 'last' will always be lesser than 'k' and hence the first condition will never evaluate to true. There is second condition as well but since it is appearing as AND condition, that will never be checked since the first condition itself is false. Hence this loop is not contributing towards the code output anyways. If this was intended to execute, then I guess it will need investigation. But from what I have noticed, it doesn't seem to harm the output of the method. In that case why even keep it there. We could very well remove it from the code. And if this was done with the some other intention, in that case this needs to be corrected as currently it is unreachable code. This issue very much exists in the release 2.6.0, I have not seen the release 2.7.0 source code, but it may very well exist in that as well (it's worth checking). Thanks Regards, Dhiraj -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548380#comment-14548380 ] Hadoop QA commented on MAPREDUCE-6350: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 3s | The applied patch generated 2 new checkstyle issues (total was 15, now 17). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 54s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 55s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | mapreduce tests | 0m 46s | Tests passed in hadoop-mapreduce-client-common. | | | | 48m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733575/YARN-1614.v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bcc1786 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/console | This message was automatically generated. JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch, YARN-1614.v3.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-5690: Resolution: Duplicate Status: Resolved (was: Patch Available) Closing this as duplicate of MAPREDUCE-4376 TestLocalMRNotification.testMR occasionally fails - Key: MAPREDUCE-5690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: MAPREDUCE-5690.1.diff TestLocalMRNotificationis occasionally failing with the error: {code} --- Test set: org.apache.hadoop.mapred.TestLocalMRNotification --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: 24.881 sec ERROR! java.io.IOException: Job cleanup didn't start in 20 seconds at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-4978: Resolution: Won't Fix Status: Resolved (was: Patch Available) This patch adds code to set properties listed below. * map.input.file * map.input.start * map.input.length Few part of current MapReduce code use these. * Those are used by CombineFileRecordReader but it set them by itself * map.input.file is used by MultipleOutputFormat which is old api I am closing this as Won't fix. Please reopen this if you need this, [~liangly]. Add a updateJobWithSplit() method for new-api job - Key: MAPREDUCE-4978 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.1.2 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: 4978-1.diff HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api job. It's better to add another method for new-api job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-5690: Labels: (was: BB2015-05-TBR) TestLocalMRNotification.testMR occasionally fails - Key: MAPREDUCE-5690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: MAPREDUCE-5690.1.diff TestLocalMRNotificationis occasionally failing with the error: {code} --- Test set: org.apache.hadoop.mapred.TestLocalMRNotification --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: 24.881 sec ERROR! java.io.IOException: Job cleanup didn't start in 20 seconds at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-4978: Labels: (was: BB2015-05-TBR) Add a updateJobWithSplit() method for new-api job - Key: MAPREDUCE-4978 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.1.2 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: 4978-1.diff HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api job. It's better to add another method for new-api job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-6350: --- Attachment: YARN-1614.v3.patch JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch, YARN-1614.v3.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5074) Remove limits on number of counters and counter groups in MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved MAPREDUCE-5074. - Resolution: Won't Fix We can re-open this if we find users compelling us to increase the limits Remove limits on number of counters and counter groups in MapReduce --- Key: MAPREDUCE-5074 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5074 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Ravi Prakash Can we please consider removing limits on the number of counters and counter groups now that it is all user code? Thanks to the much better architecture of YARN in which there is no single Job Tracker we have to worry about overloading, I feel we should do away with this (now arbitrary) constraint on users' capabilities. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-3010) ant mvn-install doesn't work on hadoop-mapreduce-project
[ https://issues.apache.org/jira/browse/MAPREDUCE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved MAPREDUCE-3010. - Resolution: Invalid We have moved to maven a long time since ant mvn-install doesn't work on hadoop-mapreduce-project Key: MAPREDUCE-3010 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3010 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Prakash Even though ant jar works, ant mvn-install fails in the compile-fault-inject step -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated MAPREDUCE-4711: Resolution: Won't Fix Status: Resolved (was: Patch Available) Append time elapsed since job-start-time for finished tasks --- Key: MAPREDUCE-4711 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 0.23.3 Reporter: Ravi Prakash Labels: BB2015-05-TBR Attachments: MAPREDUCE-4711.branch-0.23.patch In 0.20.x/1.x, the analyze job link gave this information bq. The last Map task task_sometask finished at (relative to the Job launch time): 5/10 20:23:10 (1hrs, 27mins, 54sec) The time it took for the last task to finish needs to be calculated mentally in 0.23. I believe we should print it next to the finish time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5626) TaskLogServlet could not get syslog
[ https://issues.apache.org/jira/browse/MAPREDUCE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-5626: Labels: patch (was: BB2015-05-TBR patch) TaskLogServlet could not get syslog --- Key: MAPREDUCE-5626 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5626 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Environment: Linux version 2.6.18-238.9.1.el5 Java(TM) SE Runtime Environment (build 1.6.0_43-b01) hadoop-1.2.1 Reporter: yangjun Priority: Minor Labels: patch Fix For: 1.2.1 Original Estimate: 2h Remaining Estimate: 2h When multiply tasks use one jvm and generated logs. eg. ./attempt_201211220735_0001_m_00_0: log.index ./attempt_201211220735_0001_m_01_0: log.index ./attempt_201211220735_0001_m_02_0: log.index stderr stdout syslog get from http://:50060/tasklog?attemptid= attempt_201211220735_0001_m_00_0 could get stderr,stdout,but not the others,include syslog. see TaskLogServlet.haveTaskLog() method, not check from local log.index, but check the original path. resolve: modify TaskLogServlet haveTaskLog method private boolean haveTaskLog(TaskAttemptID taskId, boolean isCleanup, TaskLog.LogName type) throws IOException { File f = TaskLog.getTaskLogFile(taskId, isCleanup, type); if (f.exists() f.canRead()) { return true; } else { File indexFile = TaskLog.getIndexFile(taskId, isCleanup); if (!indexFile.exists()) { return false; } BufferedReader fis; try { fis = new BufferedReader(new InputStreamReader( SecureIOUtils.openForRead(indexFile, TaskLog.obtainLogDirOwner(taskId; } catch (FileNotFoundException ex) { LOG.warn(Index file for the log of + taskId + does not exist.); // Assume no task reuse is used and files exist on attemptdir StringBuffer input = new StringBuffer(); input.append(LogFileDetail.LOCATION + TaskLog.getAttemptDir(taskId, isCleanup) + \n); for (LogName logName : TaskLog.LOGS_TRACKED_BY_INDEX_FILES) { input.append(logName + :0 -1\n); } fis = new BufferedReader(new StringReader(input.toString())); } try { String str = fis.readLine(); if (str == null) { // thefile doesn't have anything throw new IOException(Index file for the log of + taskId + is empty.); } String loc = str.substring(str.indexOf(LogFileDetail.LOCATION) + LogFileDetail.LOCATION.length()); File tf = new File(loc, type.toString()); return tf.exists() tf.canRead(); } finally { if (fis != null) fis.close(); } } } workaround: url add filter=SYSLOG could print syslog also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5626) TaskLogServlet could not get syslog
[ https://issues.apache.org/jira/browse/MAPREDUCE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-5626: Resolution: Won't Fix Status: Resolved (was: Patch Available) I think this could be closed as won't fix. [~yangj...@sohu.com], Could you attach patch file to JIRA as described in [wiki|https://wiki.apache.org/hadoop/HowToContribute], if you have update for this or another issue? TaskLogServlet could not get syslog --- Key: MAPREDUCE-5626 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5626 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Environment: Linux version 2.6.18-238.9.1.el5 Java(TM) SE Runtime Environment (build 1.6.0_43-b01) hadoop-1.2.1 Reporter: yangjun Priority: Minor Labels: patch Fix For: 1.2.1 Original Estimate: 2h Remaining Estimate: 2h When multiply tasks use one jvm and generated logs. eg. ./attempt_201211220735_0001_m_00_0: log.index ./attempt_201211220735_0001_m_01_0: log.index ./attempt_201211220735_0001_m_02_0: log.index stderr stdout syslog get from http://:50060/tasklog?attemptid= attempt_201211220735_0001_m_00_0 could get stderr,stdout,but not the others,include syslog. see TaskLogServlet.haveTaskLog() method, not check from local log.index, but check the original path. resolve: modify TaskLogServlet haveTaskLog method private boolean haveTaskLog(TaskAttemptID taskId, boolean isCleanup, TaskLog.LogName type) throws IOException { File f = TaskLog.getTaskLogFile(taskId, isCleanup, type); if (f.exists() f.canRead()) { return true; } else { File indexFile = TaskLog.getIndexFile(taskId, isCleanup); if (!indexFile.exists()) { return false; } BufferedReader fis; try { fis = new BufferedReader(new InputStreamReader( SecureIOUtils.openForRead(indexFile, TaskLog.obtainLogDirOwner(taskId; } catch (FileNotFoundException ex) { LOG.warn(Index file for the log of + taskId + does not exist.); // Assume no task reuse is used and files exist on attemptdir StringBuffer input = new StringBuffer(); input.append(LogFileDetail.LOCATION + TaskLog.getAttemptDir(taskId, isCleanup) + \n); for (LogName logName : TaskLog.LOGS_TRACKED_BY_INDEX_FILES) { input.append(logName + :0 -1\n); } fis = new BufferedReader(new StringReader(input.toString())); } try { String str = fis.readLine(); if (str == null) { // thefile doesn't have anything throw new IOException(Index file for the log of + taskId + is empty.); } String loc = str.substring(str.indexOf(LogFileDetail.LOCATION) + LogFileDetail.LOCATION.length()); File tf = new File(loc, type.toString()); return tf.exists() tf.canRead(); } finally { if (fis != null) fis.close(); } } } workaround: url add filter=SYSLOG could print syslog also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548628#comment-14548628 ] Hadoop QA commented on MAPREDUCE-5690: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:blue}0{color} | pre-patch | 5m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 108m 13s | Tests passed in hadoop-mapreduce-client-jobclient. | | | | 124m 34s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12619471/MAPREDUCE-5690.1.diff | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 060c84e | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/console | This message was automatically generated. TestLocalMRNotification.testMR occasionally fails - Key: MAPREDUCE-5690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Attachments: MAPREDUCE-5690.1.diff TestLocalMRNotificationis occasionally failing with the error: {code} --- Test set: org.apache.hadoop.mapred.TestLocalMRNotification --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: 24.881 sec ERROR! java.io.IOException: Job cleanup didn't start in 20 seconds at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) at
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549114#comment-14549114 ] Ray Chiang commented on MAPREDUCE-5965: --- Thanks Wilfred. I guess I'll comment on the meta issue first. In general, I'm not sure whether it's a good idea to filter based purely on size. Would it better to have a more firm whitelist and/or blacklist capability for Hadoop streaming? Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long
[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549356#comment-14549356 ] Siqi Li commented on MAPREDUCE-6350: Hi [~mitdesai], thank you for your feedback, I have uploaded patch v3 that fixed those style issues. JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, YARN-1614.v2.patch, YARN-1614.v3.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549128#comment-14549128 ] Ray Chiang commented on MAPREDUCE-5965: --- Making these comments assuming the current patch is an acceptable design approach, I have the following nitpicks: 1) Can stream.truncate.long.jobconf.values be put in the appropriate *-default.xml file for documentation purposes? 2) Can the lenLimit correspond to a Configuration variable? Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by:
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-6350: --- Attachment: MAPREDUCE-6350.v1.patch JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, YARN-1614.v2.patch, YARN-1614.v3.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549342#comment-14549342 ] Hadoop QA commented on MAPREDUCE-6350: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 2s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 35s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | mapreduce tests | 0m 46s | Tests passed in hadoop-mapreduce-client-common. | | | | 48m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733627/MAPREDUCE-6350.v1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/console | This message was automatically generated. JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, YARN-1614.v2.patch, YARN-1614.v3.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ericson yang reassigned MAPREDUCE-1380: --- Assignee: ericson yang (was: Jordà Polo) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Assignee: ericson yang Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549609#comment-14549609 ] ericson yang commented on MAPREDUCE-1380: - I am a beginner of hadoop,I want to solve this problem, but I have some questions: 1.What is the specific meaning of the adaptive scheduler and the differences between the adaptive scheduler and capacity scheduler. 2.According to my understanding, the adaptive scheduler is located in the package mapreduce, why it is not in yarn package. 3.While I have the code of hadoop 2.4.1, how can I alter them to add adaptive scheduler using the patch files above. Please forgive my poor english, Would you please give me a hand? Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Assignee: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549608#comment-14549608 ] ericson yang commented on MAPREDUCE-1380: - I am a beginner of hadoop,I want to solve this problem, but I have some questions: 1.What is the specific meaning of the adaptive scheduler and the differences between the adaptive scheduler and capacity scheduler. 2.According to my understanding, the adaptive scheduler is located in the package mapreduce, why it is not in yarn package. 3.While I have the code of hadoop 2.4.1, how can I alter them to add adaptive scheduler using the patch files above. Please forgive my poor english, Would you please give me a hand? Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Assignee: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549661#comment-14549661 ] Wilfred Spiegelenburg commented on MAPREDUCE-5965: -- Arup: Do you mind if I assign the jira to me? Would like to get this fixed in an upcoming release. Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at
[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549752#comment-14549752 ] sam liu commented on MAPREDUCE-6204: Hi Tsuyoshi, In fact, there is no hadoop cluster and hadoop related configuration in my dev env on power linux, so I also did not sure why the 'MAPRED_MAP_TASK_JAVA_OPTS' and 'MAPRED_REDUCE_TASK_JAVA_OPTS' has above default and unexpected value there. At the same time, we mentioned that the root causes of the issue is described in MAPREDUCE-6205, however its patch is still not reviewed/accepted yet. Therefore, current fix will greatly make sense to the ut TestJobCounters: it explicitly replaces the deprecated property with the legal ones, and really could set correct value to MAP/REDUCE OPTS properties and fix the unexpected env/configurations issue and make the ut more robust. Thanks! TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS --- Key: MAPREDUCE-6204 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: sam liu Assignee: sam liu Priority: Minor Labels: BB2015-05-RFC Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549762#comment-14549762 ] Tsuyoshi Ozawa commented on MAPREDUCE-6204: --- OK. I agree with the fix for using newer properties instead of using deprecated one. [~jira.shegalov] As you mentioned, the test failure can be not related and it's addressed on MAPREDUCE-6205. However, we should move using newer, we can say more proper, properties. Do you agree with fixing this? TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS --- Key: MAPREDUCE-6204 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: sam liu Assignee: sam liu Priority: Minor Labels: BB2015-05-RFC Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ericson yang updated MAPREDUCE-1380: Assignee: Jordà Polo (was: ericson yang) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Assignee: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated MAPREDUCE-5965: - Attachment: MAPREDUCE-5965.2.patch Ran into the same issue. Re-based and cleaned up patch which does the same as the Hive patch (truncate the environment value) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at
[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated MAPREDUCE-5965: - Status: Patch Available (was: Open) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more