I recommend that you use Dumbo and TypedBytes to work with binary data from Python with Hadoop streaming:
http://wiki.github.com/klbostee/dumbo/ http://dumbotics.com/2009/02/24/hadoop-1722-and-typed-bytes/ All of the TypedBytes patches should now be included in all of the Cloudera Hadoop distributions. Zak On Thu, Jan 14, 2010 at 8:11 AM, Mayra Mendoza <aryam...@gmail.com> wrote: > I need open and save image using hadoop and python, i'm tried two way to do > this: > > 1. Using WholeFileInputFormat.class > for infile in sys.stdin: > data = str(infile) > data = StringIO.StringIO(data).getvalue() > image = Image.frombuffer("RGB", (128,128), str(data), "raw","RGB",0,1) > image = image.convert('L') > print '%s\t' % (data) > > ERROR > File "/usr/lib/python2.5/site-packages/PIL/Image.py", line 576, in > fromstring > raise ValueError("not enough image data") > ValueError: not enough image data > > 2. > for infile in sys.stdin: > pathIn = os.getenv('map_input_file') > image = Image.open(pathIn) > > ERROR > IOError: [Errno 2] No such file or directory: > 'hdfs://localhost:8022/user/training/input/img10.jpg' > > This dir exist....! > train...@training-vm:~$ hadoop fs -lsr /user/training/input/ > -rw-r--r-- 1 training supergroup 1556016 2010-01-13 08:22 > /user/training/input/img10.jpg > > Can you help me, what i do wrong??, I'm using hadoop 0.20.1, python and PIL >