just in addition to my previous post...

You don't have to store the enceded files in a file system of course since
you can write your own InoutFormat which wil do this on the fly... the
overhead should not be that big.

Piotr

2009/5/14 Piotr Praczyk <piotr.prac...@gmail.com>

> Hi
>
> If you want to read the files form HDFS and can not pass the binary data,
> you can do some encoding of it (base 64 for example, but you can think about
> sth more efficient since the range of characters accprable in the input
> string is wider than that used by BASE64). It should solve the problem until
> some king of binary input is supported ( is it going to happen? ).
>
> Piotr
>
> 2009/5/14 openresearch <qiming...@openresearchinc.com>
>
>
>> All,
>>
>> I have read some recommendation regarding image (binary input) processing
>> using Hadoop-streaming which only accept text out-of-box for now.
>> http://hadoop.apache.org/core/docs/current/streaming.html
>> https://issues.apache.org/jira/browse/HADOOP-1722
>> http://markmail.org/message/24woaqie2a6mrboc
>>
>> However, I have not got any straight answer.
>>
>> One recommendation is to put image data on HDFS, but we have to do "hdf
>> -get" for each file/dir and process it locally which is every expensive.
>>
>> Another recommendation is to "...put them in a centralized place where all
>> the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO
>> will becomes bottleneck and it defeat the purpose of distributed
>> processing.
>>
>> I also notice some enhancement ticket is open for hadoop-core. Is it
>> committed to any svn (0.21) branch? can somebody show me an example how to
>> take *.jpg files (from HDFS), and process files in a distributed fashion
>> using streaming?
>>
>> Many thanks
>>
>> -Qiming
>> --
>> View this message in context:
>> http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>

Reply via email to