[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549661#comment-14549661 ]
Wilfred Spiegelenburg commented on MAPREDUCE-5965: -------------------------------------------------- Arup: Do you mind if I assign the jira to me? Would like to get this fixed in an upcoming release. > Hadoop streaming throws error if list of input files is high. Error is: > "error=7, Argument list too long at if number of input file is high" > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5965 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Arup Malakar > Assignee: Arup Malakar > Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, > MAPREDUCE-5965.patch > > > Hadoop streaming exposes all the key values in job conf as environment > variables when it forks a process for streaming code to run. Unfortunately > the variable mapreduce_input_fileinputformat_inputdir contains the list of > input files, and Linux has a limit on size of environment variables + > arguments. > Based on how long the list of files and their full path is this could be > pretty huge. And given all of these variables are not even used it stops user > from running hadoop job with large number of files, even though it could be > run. > Linux throws E2BIG if the size is greater than certain size which is error > code 7. And java translates that to "error=7, Argument list too long". More: > http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping > variables if it is greater than certain length. That way if user code > requires the environment variable it would fail. It should also introduce a > config variable to skip long variables, and set it to false by default. That > way user has to specifically set it to true to invoke this feature. > Here is the exception: > {code} > Error: java.lang.RuntimeException: Error in configuring object at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 > more Caused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 17 more Caused by: java.lang.RuntimeException: configuration exception at > org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at > org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 > more Caused by: java.io.IOException: Cannot run program > "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_000006/./rbenv_runner.sh": > error=7, Argument list too long at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at > org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 > more Caused by: java.io.IOException: error=7, Argument list too long at > java.lang.UNIXProcess.forkAndExec(Native Method) at > java.lang.UNIXProcess.<init>(UNIXProcess.java:135) at > java.lang.ProcessImpl.start(ProcessImpl.java:130) at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more > Container killed by the ApplicationMaster. Container killed on request. Exit > code is 143 Container exited with a non-zero exit code 143 > {code} > Hive does a similar trick: HIVE-2372 I have a patch for this, will soon > submit a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)