A  downside of this approach is that you will not likely have data locality
for the data on shared file systems, compared with data coming from an input
split.
That being said,
from your script, *hadoop dfs -get FILE -* will write the file to standard
out.

On Thu, May 14, 2009 at 10:01 AM, Piotr Praczyk <piotr.prac...@gmail.com>wrote:

> just in addition to my previous post...
>
> You don't have to store the enceded files in a file system of course since
> you can write your own InoutFormat which wil do this on the fly... the
> overhead should not be that big.
>
> Piotr
>
> 2009/5/14 Piotr Praczyk <piotr.prac...@gmail.com>
>
> > Hi
> >
> > If you want to read the files form HDFS and can not pass the binary data,
> > you can do some encoding of it (base 64 for example, but you can think
> about
> > sth more efficient since the range of characters accprable in the input
> > string is wider than that used by BASE64). It should solve the problem
> until
> > some king of binary input is supported ( is it going to happen? ).
> >
> > Piotr
> >
> > 2009/5/14 openresearch <qiming...@openresearchinc.com>
> >
> >
> >> All,
> >>
> >> I have read some recommendation regarding image (binary input)
> processing
> >> using Hadoop-streaming which only accept text out-of-box for now.
> >> http://hadoop.apache.org/core/docs/current/streaming.html
> >> https://issues.apache.org/jira/browse/HADOOP-1722
> >> http://markmail.org/message/24woaqie2a6mrboc
> >>
> >> However, I have not got any straight answer.
> >>
> >> One recommendation is to put image data on HDFS, but we have to do "hdf
> >> -get" for each file/dir and process it locally which is every expensive.
> >>
> >> Another recommendation is to "...put them in a centralized place where
> all
> >> the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO
> >> will becomes bottleneck and it defeat the purpose of distributed
> >> processing.
> >>
> >> I also notice some enhancement ticket is open for hadoop-core. Is it
> >> committed to any svn (0.21) branch? can somebody show me an example how
> to
> >> take *.jpg files (from HDFS), and process files in a distributed fashion
> >> using streaming?
> >>
> >> Many thanks
> >>
> >> -Qiming
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to