Benjoy to redirect stdout add the lines import sys sys.stdout=sys.stderr
to the top of your py files (i.e right after the shebang line). J On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS <bejoy.had...@gmail.com> wrote: > Hi Harsh > Thank You for the response. I'm on Cloudera demo VM. It is on > hadoop 0.20 and has python installed. Do I have to do any further > installation/configuration to get python running? > > > On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <ha...@cloudera.com> wrote: > >> The env binary would be present, but do all your TT nodes have python >> properly installed on them? The env program can't find them and that's >> probably why your scripts with shbang don't run. >> >> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <bejoy.had...@gmail.com> wrote: >> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the >> > entry point to your mapper and reducer'. >> > Basically I'm a java hadoop developer and has no idea on python >> programming. >> > Could you please help me with mode details like the line of code i need >> to >> > include to achieve this. >> > >> > Also I tried a still more deep drill down on my error logs and found the >> > following line as well >> > >> > stderr logs >> > >> > /usr/bin/env: python >> > : No such file or directory >> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >> > failed with code 127 >> > at >> > >> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >> > at >> > >> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >> > at >> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >> > at >> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> > at org.apache.hadoop.mapred.Child.main(Child.java:262) >> > log4j:WARN No appenders could be found for logger >> > (org.apache.hadoop.hdfs.DFSClient). >> > log4j:WARN Please initialize the log4j system properly. >> > >> > I verified on the existence of such a directory and it was present >> > '/usr/bin/env' . >> > >> > Could you please provide little more guidance on the same. >> > >> > >> > >> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <jer...@lewi.us> wrote: >> >> >> >> Bejoy, >> >> The other problem I typically ran into using python streaming jobs was >> if >> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass >> data >> >> back to Hadoop, any erroneous "print" statements will cause the pipe to >> >> break. The easiest way around this is to redirect "stdout" to "stderr" >> at >> >> the entry point to your mapper and reducer; do this even before you >> import >> >> any modules so that even if those modules call "print" it gets >> redirected. >> >> Note: if your using dumbo (but I don't think you are) the above >> solution >> >> may not work but I can send you a pointer. >> >> J >> >> >> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.had...@gmail.com> >> wrote: >> >>> >> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran >> >>> into completion. But then the reducers failed with another exception >> related >> >>> to pipes. I believe it may be due to permission issues again. I tried >> >>> setting a few additional config parameters but it didn't do the job. >> Please >> >>> find the command used and the error logs from jobtracker web UI >> >>> >> >>> hadoop jar >> >>> >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D >> >>> dfs.data.dir=/home/streaming/tmp -D >> >>> mapred.local.dir=/home/streaming/tmp/local -D >> >>> mapred.system.dir=/home/streaming/tmp/system -D >> >>> mapred.temp.dir=/home/streaming/tmp/temp -input >> >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >> >>> -mapper /home/streaming/WcStreamMap.py -reducer >> >>> /home/streaming/WcStreamReduce.py >> >>> >> >>> >> >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >> >>> failed with code 127 >> >>> at >> >>> >> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >> >>> at >> >>> >> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >> >>> at >> >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >> >>> at >> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) >> >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> >>> at java.security.AccessController.doPrivileged(Native Method) >> >>> at javax.security.auth.Subject.doAs(Subject.java:396) >> >>> at >> >>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> >>> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> >>> >> >>> >> >>> The folder permissions at the time of job execution are as follows >> >>> >> >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/ >> >>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp >> >>> -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py >> >>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py >> >>> >> >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/ >> >>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop >> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local >> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system >> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp >> >>> >> >>> Am I missing some thing here? >> >>> >> >>> It is not for long I'm into Linux so couldn't try your second >> suggestion >> >>> on setting up the Linux task controller. >> >>> >> >>> Thanks a lot >> >>> >> >>> Regards >> >>> Bejoy.K.S >> >>> >> >>> >> >>> >> >>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <jer...@lewi.us> wrote: >> >>>> >> >>>> I would suggest you try putting your mapper/reducer py files in a >> >>>> directory that is world readable at every level . i.e /tmp/test. I >> had >> >>>> similar problems when I was using streaming and I believe my >> workaround was >> >>>> to put the mapper/reducers outside my home directory. The other more >> >>>> involved alternative is to setup the linux task controller so you can >> run >> >>>> your MR jobs as the user who submits the jobs. >> >>>> J >> >>>> >> >>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.had...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> Hi >> >>>>> I wanted to try out hadoop steaming and got the sample python >> >>>>> code for mapper and reducer. I copied both into my lfs and tried >> running the >> >>>>> steaming job as mention in the documentation. >> >>>>> Here the command i used to run the job >> >>>>> >> >>>>> hadoop jar >> >>>>> >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> >>>>> -input /userdata/bejoy/apps/wc/input -output >> /userdata/bejoy/apps/wc/output >> >>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer >> >>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py >> >>>>> >> >>>>> Here other than input and output the rest all are on lfs locations. >> How >> >>>>> ever the job is failing. The error log from the jobtracker url is as >> >>>>> >> >>>>> java.lang.RuntimeException: Error in configuring object >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >> >>>>> at >> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) >> >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) >> >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> >>>>> at java.security.AccessController.doPrivileged(Native Method) >> >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >> >>>>> at >> >>>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> >>>>> Caused by: java.lang.reflect.InvocationTargetException >> >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>>>> at >> >>>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >>>>> at >> >>>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >> >>>>> ... 9 more >> >>>>> Caused by: java.lang.RuntimeException: Error in configuring object >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >> >>>>> at >> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >> >>>>> ... 14 more >> >>>>> Caused by: java.lang.reflect.InvocationTargetException >> >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>>>> at >> >>>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >>>>> at >> >>>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >> >>>>> at >> >>>>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >> >>>>> ... 17 more >> >>>>> Caused by: java.lang.RuntimeException: configuration exception >> >>>>> at >> >>>>> >> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) >> >>>>> at >> >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) >> >>>>> ... 22 more >> >>>>> Caused by: java.io.IOException: Cannot run program >> >>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": >> java.io.IOException: >> >>>>> error=13, Permission denied >> >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) >> >>>>> at >> >>>>> >> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) >> >>>>> ... 23 more >> >>>>> Caused by: java.io.IOException: java.io.IOException: error=13, >> >>>>> Permission denied >> >>>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) >> >>>>> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >> >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) >> >>>>> ... 24 more >> >>>>> >> >>>>> On the error I checked the permissions of mapper and reducer. Issued >> a >> >>>>> chmod 777 command as well. Still no luck. >> >>>>> >> >>>>> The permission of the files are as follows >> >>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/ >> >>>>> -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py >> >>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 >> WcStreamReduce.py >> >>>>> >> >>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would >> be >> >>>>> on pseudo distributed mode. Any help would be highly appreciated. >> >>>>> >> >>>>> Thank You >> >>>>> >> >>>>> Regards >> >>>>> Bejoy.K.S >> >>>>> >> >>>> >> >>> >> >> >> > >> > >> >> >> >> -- >> Harsh J >> > >