All, I have read some recommendation regarding image (binary input) processing using Hadoop-streaming which only accept text out-of-box for now. http://hadoop.apache.org/core/docs/current/streaming.html https://issues.apache.org/jira/browse/HADOOP-1722 http://markmail.org/message/24woaqie2a6mrboc
However, I have not got any straight answer. One recommendation is to put image data on HDFS, but we have to do "hdf -get" for each file/dir and process it locally which is every expensive. Another recommendation is to "...put them in a centralized place where all the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO will becomes bottleneck and it defeat the purpose of distributed processing. I also notice some enhancement ticket is open for hadoop-core. Is it committed to any svn (0.21) branch? can somebody show me an example how to take *.jpg files (from HDFS), and process files in a distributed fashion using streaming? Many thanks -Qiming -- View this message in context: http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html Sent from the Hadoop core-user mailing list archive at Nabble.com.