Re: Hadoop Streaming job Fails - Permission Denied error

Bejoy KS Tue, 13 Sep 2011 00:42:34 -0700

Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
entry point to your mapper and reducer'.
Basically I'm a java hadoop developer and has no idea on python programming.
Could you please help me with mode details like the line of code i need to
include to achieve this.


Also I tried a still more deep drill down on my error logs and found the
following line as well

*stderr logs*

/usr/bin/env: python
: No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 127
    at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.

I verified on the existence of such a directory and it was present
'/usr/bin/env' .

Could you please provide little more guidance on the same.



On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <jer...@lewi.us> wrote:

> Bejoy,
>
> The other problem I typically ran into using python streaming jobs was if
> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
> back to Hadoop, any erroneous "print" statements will cause the pipe to
> break. The easiest way around this is to redirect "stdout" to "stderr" at
> the entry point to your mapper and reducer; do this even before you import
> any modules so that even if those modules call "print" it gets redirected.
>
> Note: if your using dumbo (but I don't think you are) the above solution
> may not work but I can send you a pointer.
>
> J
>
>
> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.had...@gmail.com> wrote:
>
>> Thanks Jeremy. I tried with your first suggestion and the mappers ran into
>> completion. But then the reducers failed with another exception related to
>> pipes. I believe it may be due to permission issues again. I tried setting a
>> few additional config parameters but it didn't do the job. Please find the
>> command used and the error logs from jobtracker web UI
>>
>> hadoop  jar
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>> dfs.data.dir=/home/streaming/tmp -D
>> mapred.local.dir=/home/streaming/tmp/local -D
>> mapred.system.dir=/home/streaming/tmp/system -D
>> mapred.temp.dir=/home/streaming/tmp/temp -input
>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>> -mapper /home/streaming/WcStreamMap.py  -reducer
>> /home/streaming/WcStreamReduce.py
>>
>>
>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> failed with code 127
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>>     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>>     at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>>
>> The folder permissions at the time of job execution are as follows
>>
>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>>
>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>>
>> Am I missing some thing here?
>>
>> It is not for long I'm into Linux so couldn't try your second suggestion
>> on setting up the Linux task controller.
>>
>> Thanks a lot
>>
>> Regards
>> Bejoy.K.S
>>
>>
>>
>>
>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <jer...@lewi.us> wrote:
>>
>>> I would suggest you try putting your mapper/reducer py files in a
>>> directory that is world readable at every level . i.e /tmp/test. I had
>>> similar problems when I was using streaming and I believe my workaround was
>>> to put the mapper/reducers outside my home directory. The other more
>>> involved alternative is to setup the linux task controller so you can run
>>> your MR jobs as the user who submits the jobs.
>>>
>>> J
>>>
>>>
>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.had...@gmail.com>wrote:
>>>
>>>> Hi
>>>>       I wanted to try out hadoop steaming and got the sample python code
>>>> for mapper and reducer. I copied both into my lfs and tried running the
>>>> steaming job as mention in the documentation.
>>>> Here the command i used to run the job
>>>>
>>>> hadoop  jar
>>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>>>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>>>
>>>> Here other than input and output the rest all are on lfs locations. How
>>>> ever the job is failing. The error log from the jobtracker url is as
>>>>
>>>> java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>     ... 9 more
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>>     ... 14 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>     ... 17 more
>>>> Caused by: java.lang.RuntimeException: configuration exception
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>>     ... 22 more
>>>> Caused by: java.io.IOException: Cannot run program
>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>>>> error=13, Permission denied
>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>     ... 23 more
>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
>>>> Permission denied
>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>>     ... 24 more
>>>>
>>>> On the error I checked the permissions of mapper and reducer. Issued a
>>>> chmod 777 command as well. Still no luck.
>>>>
>>>> The permission of the files are as follows
>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>>>
>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be
>>>> on pseudo distributed mode. Any help would be highly appreciated.
>>>>
>>>> Thank You
>>>>
>>>> Regards
>>>> Bejoy.K.S
>>>>
>>>>
>>>
>>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Reply via email to