Re: Hadoop Streaming job Fails - Permission Denied error

Bejoy KS Tue, 13 Sep 2011 01:43:07 -0700

Hi Harsh
         Thank You for the response. I'm on Cloudera demo VM. It is on
hadoop 0.20 and has python installed. Do I have to do any further
installation/configuration to get python running?


On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <ha...@cloudera.com> wrote:

> The env binary would be present, but do all your TT nodes have python
> properly installed on them? The env program can't find them and that's
> probably why your scripts with shbang don't run.
>
> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <bejoy.had...@gmail.com> wrote:
> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
> > entry point to your mapper and reducer'.
> > Basically I'm a java hadoop developer and has no idea on python
> programming.
> > Could you please help me with mode details like the line of code i need
> to
> > include to achieve this.
> >
> > Also I tried a still more deep drill down on my error logs and found the
> > following line as well
> >
> > stderr logs
> >
> > /usr/bin/env: python
> > : No such file or directory
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > failed with code 127
> >     at
> >
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
> >     at
> >
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
> >     at
> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
> >     at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:396)
> >     at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> > log4j:WARN No appenders could be found for logger
> > (org.apache.hadoop.hdfs.DFSClient).
> > log4j:WARN Please initialize the log4j system properly.
> >
> > I verified on the existence of such a directory and it was present
> > '/usr/bin/env' .
> >
> > Could you please provide little more guidance on the same.
> >
> >
> >
> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <jer...@lewi.us> wrote:
> >>
> >> Bejoy,
> >> The other problem I typically ran into using python streaming jobs was
> if
> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
> data
> >> back to Hadoop, any erroneous "print" statements will cause the pipe to
> >> break. The easiest way around this is to redirect "stdout" to "stderr"
> at
> >> the entry point to your mapper and reducer; do this even before you
> import
> >> any modules so that even if those modules call "print" it gets
> redirected.
> >> Note: if your using dumbo (but I don't think you are) the above solution
> >> may not work but I can send you a pointer.
> >> J
> >>
> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.had...@gmail.com>
> wrote:
> >>>
> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
> >>> into completion. But then the reducers failed with another exception
> related
> >>> to pipes. I believe it may be due to permission issues again. I tried
> >>> setting a few additional config parameters but it didn't do the job.
> Please
> >>> find the command used and the error logs from jobtracker web UI
> >>>
> >>> hadoop  jar
> >>>
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
> >>> dfs.data.dir=/home/streaming/tmp -D
> >>> mapred.local.dir=/home/streaming/tmp/local -D
> >>> mapred.system.dir=/home/streaming/tmp/system -D
> >>> mapred.temp.dir=/home/streaming/tmp/temp -input
> >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> >>> -mapper /home/streaming/WcStreamMap.py  -reducer
> >>> /home/streaming/WcStreamReduce.py
> >>>
> >>>
> >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> >>> failed with code 127
> >>>     at
> >>>
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
> >>>     at
> >>>
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
> >>>     at
> >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
> >>>     at
> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
> >>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>
> >>>
> >>> The folder permissions at the time of job execution are as follows
> >>>
> >>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
> >>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
> >>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
> >>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
> >>>
> >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
> >>>
> >>> Am I missing some thing here?
> >>>
> >>> It is not for long I'm into Linux so couldn't try your second
> suggestion
> >>> on setting up the Linux task controller.
> >>>
> >>> Thanks a lot
> >>>
> >>> Regards
> >>> Bejoy.K.S
> >>>
> >>>
> >>>
> >>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <jer...@lewi.us> wrote:
> >>>>
> >>>> I would suggest you try putting your mapper/reducer py files in a
> >>>> directory that is world readable at every level . i.e /tmp/test. I had
> >>>> similar problems when I was using streaming and I believe my
> workaround was
> >>>> to put the mapper/reducers outside my home directory. The other more
> >>>> involved alternative is to setup the linux task controller so you can
> run
> >>>> your MR jobs as the user who submits the jobs.
> >>>> J
> >>>>
> >>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.had...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi
> >>>>>       I wanted to try out hadoop steaming and got the sample python
> >>>>> code for mapper and reducer. I copied both into my lfs and tried
> running the
> >>>>> steaming job as mention in the documentation.
> >>>>> Here the command i used to run the job
> >>>>>
> >>>>> hadoop  jar
> >>>>>
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> >>>>> -input /userdata/bejoy/apps/wc/input -output
> /userdata/bejoy/apps/wc/output
> >>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
> >>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
> >>>>>
> >>>>> Here other than input and output the rest all are on lfs locations.
> How
> >>>>> ever the job is failing. The error log from the jobtracker url is as
> >>>>>
> >>>>> java.lang.RuntimeException: Error in configuring object
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>>>>     at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
> >>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> >>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>>     at
> >>>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>>> Caused by: java.lang.reflect.InvocationTargetException
> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>     at
> >>>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>>>     at
> >>>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> >>>>>     ... 9 more
> >>>>> Caused by: java.lang.RuntimeException: Error in configuring object
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>>>>     at
> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >>>>>     ... 14 more
> >>>>> Caused by: java.lang.reflect.InvocationTargetException
> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>     at
> >>>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>>>     at
> >>>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> >>>>>     ... 17 more
> >>>>> Caused by: java.lang.RuntimeException: configuration exception
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
> >>>>>     ... 22 more
> >>>>> Caused by: java.io.IOException: Cannot run program
> >>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py":
> java.io.IOException:
> >>>>> error=13, Permission denied
> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
> >>>>>     ... 23 more
> >>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
> >>>>> Permission denied
> >>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> >>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> >>>>>     ... 24 more
> >>>>>
> >>>>> On the error I checked the permissions of mapper and reducer. Issued
> a
> >>>>> chmod 777 command as well. Still no luck.
> >>>>>
> >>>>> The permission of the files are as follows
> >>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
> >>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
> >>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42
> WcStreamReduce.py
> >>>>>
> >>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would
> be
> >>>>> on pseudo distributed mode. Any help would be highly appreciated.
> >>>>>
> >>>>> Thank You
> >>>>>
> >>>>> Regards
> >>>>> Bejoy.K.S
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop Streaming job Fails - Permission Denied error

Reply via email to