just in addition to my previous post... You don't have to store the enceded files in a file system of course since you can write your own InoutFormat which wil do this on the fly... the overhead should not be that big.
Piotr 2009/5/14 Piotr Praczyk <piotr.prac...@gmail.com> > Hi > > If you want to read the files form HDFS and can not pass the binary data, > you can do some encoding of it (base 64 for example, but you can think about > sth more efficient since the range of characters accprable in the input > string is wider than that used by BASE64). It should solve the problem until > some king of binary input is supported ( is it going to happen? ). > > Piotr > > 2009/5/14 openresearch <qiming...@openresearchinc.com> > > >> All, >> >> I have read some recommendation regarding image (binary input) processing >> using Hadoop-streaming which only accept text out-of-box for now. >> http://hadoop.apache.org/core/docs/current/streaming.html >> https://issues.apache.org/jira/browse/HADOOP-1722 >> http://markmail.org/message/24woaqie2a6mrboc >> >> However, I have not got any straight answer. >> >> One recommendation is to put image data on HDFS, but we have to do "hdf >> -get" for each file/dir and process it locally which is every expensive. >> >> Another recommendation is to "...put them in a centralized place where all >> the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO >> will becomes bottleneck and it defeat the purpose of distributed >> processing. >> >> I also notice some enhancement ticket is open for hadoop-core. Is it >> committed to any svn (0.21) branch? can somebody show me an example how to >> take *.jpg files (from HDFS), and process files in a distributed fashion >> using streaming? >> >> Many thanks >> >> -Qiming >> -- >> View this message in context: >> http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> >