Hi

If you want to read the files form HDFS and can not pass the binary data,
you can do some encoding of it (base 64 for example, but you can think about
sth more efficient since the range of characters accprable in the input
string is wider than that used by BASE64). It should solve the problem until
some king of binary input is supported ( is it going to happen? ).

Piotr

2009/5/14 openresearch <qiming...@openresearchinc.com>

>
> All,
>
> I have read some recommendation regarding image (binary input) processing
> using Hadoop-streaming which only accept text out-of-box for now.
> http://hadoop.apache.org/core/docs/current/streaming.html
> https://issues.apache.org/jira/browse/HADOOP-1722
> http://markmail.org/message/24woaqie2a6mrboc
>
> However, I have not got any straight answer.
>
> One recommendation is to put image data on HDFS, but we have to do "hdf
> -get" for each file/dir and process it locally which is every expensive.
>
> Another recommendation is to "...put them in a centralized place where all
> the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO
> will becomes bottleneck and it defeat the purpose of distributed
> processing.
>
> I also notice some enhancement ticket is open for hadoop-core. Is it
> committed to any svn (0.21) branch? can somebody show me an example how to
> take *.jpg files (from HDFS), and process files in a distributed fashion
> using streaming?
>
> Many thanks
>
> -Qiming
> --
> View this message in context:
> http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Reply via email to