Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the entry point to your mapper and reducer'. Basically I'm a java hadoop developer and has no idea on python programming. Could you please help me with mode details like the line of code i need to include to achieve this.
Also I tried a still more deep drill down on my error logs and found the following line as well *stderr logs* /usr/bin/env: python : No such file or directory java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient). log4j:WARN Please initialize the log4j system properly. I verified on the existence of such a directory and it was present '/usr/bin/env' . Could you please provide little more guidance on the same. On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <jer...@lewi.us> wrote: > Bejoy, > > The other problem I typically ran into using python streaming jobs was if > my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data > back to Hadoop, any erroneous "print" statements will cause the pipe to > break. The easiest way around this is to redirect "stdout" to "stderr" at > the entry point to your mapper and reducer; do this even before you import > any modules so that even if those modules call "print" it gets redirected. > > Note: if your using dumbo (but I don't think you are) the above solution > may not work but I can send you a pointer. > > J > > > On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.had...@gmail.com> wrote: > >> Thanks Jeremy. I tried with your first suggestion and the mappers ran into >> completion. But then the reducers failed with another exception related to >> pipes. I believe it may be due to permission issues again. I tried setting a >> few additional config parameters but it didn't do the job. Please find the >> command used and the error logs from jobtracker web UI >> >> hadoop jar >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D >> dfs.data.dir=/home/streaming/tmp -D >> mapred.local.dir=/home/streaming/tmp/local -D >> mapred.system.dir=/home/streaming/tmp/system -D >> mapred.temp.dir=/home/streaming/tmp/temp -input >> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >> -mapper /home/streaming/WcStreamMap.py -reducer >> /home/streaming/WcStreamReduce.py >> >> >> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >> failed with code 127 >> at >> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >> at >> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >> at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >> at >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) >> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> >> >> The folder permissions at the time of job execution are as follows >> >> cloudera@cloudera-vm:~$ ls -l /home/streaming/ >> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp >> -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py >> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py >> >> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/ >> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop >> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local >> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system >> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp >> >> Am I missing some thing here? >> >> It is not for long I'm into Linux so couldn't try your second suggestion >> on setting up the Linux task controller. >> >> Thanks a lot >> >> Regards >> Bejoy.K.S >> >> >> >> >> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <jer...@lewi.us> wrote: >> >>> I would suggest you try putting your mapper/reducer py files in a >>> directory that is world readable at every level . i.e /tmp/test. I had >>> similar problems when I was using streaming and I believe my workaround was >>> to put the mapper/reducers outside my home directory. The other more >>> involved alternative is to setup the linux task controller so you can run >>> your MR jobs as the user who submits the jobs. >>> >>> J >>> >>> >>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.had...@gmail.com>wrote: >>> >>>> Hi >>>> I wanted to try out hadoop steaming and got the sample python code >>>> for mapper and reducer. I copied both into my lfs and tried running the >>>> steaming job as mention in the documentation. >>>> Here the command i used to run the job >>>> >>>> hadoop jar >>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >>>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer >>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py >>>> >>>> Here other than input and output the rest all are on lfs locations. How >>>> ever the job is failing. The error log from the jobtracker url is as >>>> >>>> java.lang.RuntimeException: Error in configuring object >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>> ... 9 more >>>> Caused by: java.lang.RuntimeException: Error in configuring object >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >>>> ... 14 more >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>> ... 17 more >>>> Caused by: java.lang.RuntimeException: configuration exception >>>> at >>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) >>>> at >>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) >>>> ... 22 more >>>> Caused by: java.io.IOException: Cannot run program >>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException: >>>> error=13, Permission denied >>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) >>>> at >>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) >>>> ... 23 more >>>> Caused by: java.io.IOException: java.io.IOException: error=13, >>>> Permission denied >>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) >>>> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) >>>> ... 24 more >>>> >>>> On the error I checked the permissions of mapper and reducer. Issued a >>>> chmod 777 command as well. Still no luck. >>>> >>>> The permission of the files are as follows >>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/ >>>> -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py >>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py >>>> >>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be >>>> on pseudo distributed mode. Any help would be highly appreciated. >>>> >>>> Thank You >>>> >>>> Regards >>>> Bejoy.K.S >>>> >>>> >>> >> >