Hi Harsh Thank You for the response. I'm on Cloudera demo VM. It is on hadoop 0.20 and has python installed. Do I have to do any further installation/configuration to get python running?
On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <ha...@cloudera.com> wrote: > The env binary would be present, but do all your TT nodes have python > properly installed on them? The env program can't find them and that's > probably why your scripts with shbang don't run. > > On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <bejoy.had...@gmail.com> wrote: > > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the > > entry point to your mapper and reducer'. > > Basically I'm a java hadoop developer and has no idea on python > programming. > > Could you please help me with mode details like the line of code i need > to > > include to achieve this. > > > > Also I tried a still more deep drill down on my error logs and found the > > following line as well > > > > stderr logs > > > > /usr/bin/env: python > > : No such file or directory > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > > failed with code 127 > > at > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > > at > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > > at > org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > log4j:WARN No appenders could be found for logger > > (org.apache.hadoop.hdfs.DFSClient). > > log4j:WARN Please initialize the log4j system properly. > > > > I verified on the existence of such a directory and it was present > > '/usr/bin/env' . > > > > Could you please provide little more guidance on the same. > > > > > > > > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <jer...@lewi.us> wrote: > >> > >> Bejoy, > >> The other problem I typically ran into using python streaming jobs was > if > >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass > data > >> back to Hadoop, any erroneous "print" statements will cause the pipe to > >> break. The easiest way around this is to redirect "stdout" to "stderr" > at > >> the entry point to your mapper and reducer; do this even before you > import > >> any modules so that even if those modules call "print" it gets > redirected. > >> Note: if your using dumbo (but I don't think you are) the above solution > >> may not work but I can send you a pointer. > >> J > >> > >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.had...@gmail.com> > wrote: > >>> > >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran > >>> into completion. But then the reducers failed with another exception > related > >>> to pipes. I believe it may be due to permission issues again. I tried > >>> setting a few additional config parameters but it didn't do the job. > Please > >>> find the command used and the error logs from jobtracker web UI > >>> > >>> hadoop jar > >>> > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar > >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D > >>> dfs.data.dir=/home/streaming/tmp -D > >>> mapred.local.dir=/home/streaming/tmp/local -D > >>> mapred.system.dir=/home/streaming/tmp/system -D > >>> mapred.temp.dir=/home/streaming/tmp/temp -input > >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output > >>> -mapper /home/streaming/WcStreamMap.py -reducer > >>> /home/streaming/WcStreamReduce.py > >>> > >>> > >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > >>> failed with code 127 > >>> at > >>> > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > >>> at > >>> > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > >>> at > >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > >>> at > >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) > >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>> > >>> > >>> The folder permissions at the time of job execution are as follows > >>> > >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/ > >>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp > >>> -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py > >>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py > >>> > >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/ > >>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop > >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local > >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system > >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp > >>> > >>> Am I missing some thing here? > >>> > >>> It is not for long I'm into Linux so couldn't try your second > suggestion > >>> on setting up the Linux task controller. > >>> > >>> Thanks a lot > >>> > >>> Regards > >>> Bejoy.K.S > >>> > >>> > >>> > >>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <jer...@lewi.us> wrote: > >>>> > >>>> I would suggest you try putting your mapper/reducer py files in a > >>>> directory that is world readable at every level . i.e /tmp/test. I had > >>>> similar problems when I was using streaming and I believe my > workaround was > >>>> to put the mapper/reducers outside my home directory. The other more > >>>> involved alternative is to setup the linux task controller so you can > run > >>>> your MR jobs as the user who submits the jobs. > >>>> J > >>>> > >>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.had...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Hi > >>>>> I wanted to try out hadoop steaming and got the sample python > >>>>> code for mapper and reducer. I copied both into my lfs and tried > running the > >>>>> steaming job as mention in the documentation. > >>>>> Here the command i used to run the job > >>>>> > >>>>> hadoop jar > >>>>> > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar > >>>>> -input /userdata/bejoy/apps/wc/input -output > /userdata/bejoy/apps/wc/output > >>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer > >>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py > >>>>> > >>>>> Here other than input and output the rest all are on lfs locations. > How > >>>>> ever the job is failing. The error log from the jobtracker url is as > >>>>> > >>>>> java.lang.RuntimeException: Error in configuring object > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > >>>>> at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) > >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) > >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>>>> at java.security.AccessController.doPrivileged(Native Method) > >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>>> at > >>>>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>>>> Caused by: java.lang.reflect.InvocationTargetException > >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>>>> at > >>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>>>> at > >>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > >>>>> ... 9 more > >>>>> Caused by: java.lang.RuntimeException: Error in configuring object > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > >>>>> at > org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > >>>>> ... 14 more > >>>>> Caused by: java.lang.reflect.InvocationTargetException > >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>>>> at > >>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>>>> at > >>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>>> at > >>>>> > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > >>>>> ... 17 more > >>>>> Caused by: java.lang.RuntimeException: configuration exception > >>>>> at > >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) > >>>>> at > >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) > >>>>> ... 22 more > >>>>> Caused by: java.io.IOException: Cannot run program > >>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": > java.io.IOException: > >>>>> error=13, Permission denied > >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > >>>>> at > >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) > >>>>> ... 23 more > >>>>> Caused by: java.io.IOException: java.io.IOException: error=13, > >>>>> Permission denied > >>>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) > >>>>> at java.lang.ProcessImpl.start(ProcessImpl.java:65) > >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) > >>>>> ... 24 more > >>>>> > >>>>> On the error I checked the permissions of mapper and reducer. Issued > a > >>>>> chmod 777 command as well. Still no luck. > >>>>> > >>>>> The permission of the files are as follows > >>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/ > >>>>> -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py > >>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 > WcStreamReduce.py > >>>>> > >>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would > be > >>>>> on pseudo distributed mode. Any help would be highly appreciated. > >>>>> > >>>>> Thank You > >>>>> > >>>>> Regards > >>>>> Bejoy.K.S > >>>>> > >>>> > >>> > >> > > > > > > > > -- > Harsh J >