Are you passing the python script to the cluster using the -file
option? eg -mapper -file


On Wed, Feb 17, 2010 at 7:45 PM, Dan Starr <> wrote:
> Hi, I've tried posting this to Cloudera's community support site, but
> the community website returns various server
> errors at the moment.  I believe the following is an issue related to
> my environment within Cloudera's Training virtual machine.
> Despite having success running Hadoop streaming on other Hadoop
> clusters and on Cloudera's Training VM in local mode, I'm currently
> getting an error when attempting to run a simple Hadoop streaming job
> in the normal queue based mode on the Training VM.  I'm thinking the
> error described below is an issue related to the worker node not
> recognizing the python reference in the script's top shebang line.
> The hadoop command I am executing is:
> hadoop jar 
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-0.20.1+133-streaming.jar
> -mapper -reducer org.apache.hadoop.mapred.lib.IdentityReducer
> -input test_input/* -output output
> Where the test_input directory contains 3 UNIX formatted, single line files:
> training-vm: 3$ hadoop dfs -ls /user/training/test_input/
> Found 3 items
> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
> /user/training/test_input/file1
> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
> /user/training/test_input/file2
> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
> /user/training/test_input/file3
> training-vm: 3$ hadoop dfs -cat /user/training/test_input/*
> test_line1
> test_line2
> test_line3
> And where looks like (UNIX formatted):
> #!/usr/bin/python
> import sys
> for line in sys.stdin:
>    print line
> The resulting Hadoop-Streaming error is:
> Cannot run program "":
> error=2, No such file or directory
> at java.lang.ProcessBuilder.start(
> at org.apache.hadoop.streaming.PipeMapRed.configure(
>    ...
> I get the same error when placing the python script on the HDFS, and
> then using this in the hadoop command:
> ... -mapper hdfs:///user/training/ ...
> One suggestion found online, which may not be relevant to Cloudera's
> distribution, mentions that the first line of the hadoop-streaming
> python script (the shebang line) may not describe an applicable path
> for the system.  The solution mentioned is to use: ... -mapper "python
> " ... in the Hadoop streaming command.  This doesn't seem to
> work correctly for me, since I find that the lines from the input data
> files are also parsed by the Python interpreter.  But this does reveal
> that python is available on the worker node when using this technique.
>  I have also tried without success the '-mapper' technique
> using shebang lines: "#!/usr/bin/env python", although on the training
> VM Python is installed under /usr/bin/python.
> Maybe the issue is something else.  Any suggestions or insights will be 
> helpful.

Reply via email to