I haven't done this using hadoop but before i 16.4 i had written my
own distributed batch processor using HDFS as a common file storage
and remote execution of python scripts.
They all required a custom module which was copied to the remote temp
folders (a primitive implementation of cacheFile)
So this is what I did: just after #!/usr/bin/env python
import sys
sys.path.append('.')
import mylib
dostuff
so that your module can be found in the current path.
It should work thereafter
Regards
Saptarshi
On May 22, 2008, at 7:39 PM, Martin Blom wrote:
Hello all,
I'm trying to stream a little python script on my small hadoop
cluster, and it doesn't work like I thought it would.
The script looks something like
#!/usr/bin/env python
import mylib
dostuff
where mylib is a small python library that I want included, and I
launch the whole thing with something like
bin/hadoop jar contrib/streaming/hadoop-0.16.4-streaming.jar
-cacheFile "hdfs://master:54310/user/hadoop/mylib.py#mylib.py" -file
scrpit.py -mapper "script.py" -input input -output output
so it seems to me like the library should be available to the script.
When I run the script locally on my machine everything works perfectly
fine. However, when I run it it the script can't find the library.
Does hadoop do anything strange to default paths? Am I missing
something obvious? Any pointers or ideas on how to fix this would be
great.
Martin Blom
Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha
You love your home and want it to be beautiful.
smime.p7s
Description: S/MIME cryptographic signature