Re: hadoop streaming binary input / image processing

jason hadoop Fri, 15 May 2009 07:15:45 -0700

My apologies Piotr, I was referring to the streaming case and then pulling
the file out of a shared file systems, not using an input split that
contains the image data as you suggest.


On Thu, May 14, 2009 at 11:50 PM, Piotr Praczyk <piotr.prac...@gmail.com>wrote:

> Depends what API do you use. When writing an InputSplit implementation, it
> is possible to specify on which nodes does the data reside. I am new to
> Hadoop, but as far as I know, doing this
> should enable the support for data locality. Moreover, implementing a
> subclass of TextInputFormat and adding some encoding on the fly should not
> impact any locality properties.
>
>
> Piotr
>
>
> 2009/5/15 jason hadoop <jason.had...@gmail.com>
>
> > A  downside of this approach is that you will not likely have data
> locality
> > for the data on shared file systems, compared with data coming from an
> > input
> > split.
> > That being said,
> > from your script, *hadoop dfs -get FILE -* will write the file to
> standard
> > out.
> >
> > On Thu, May 14, 2009 at 10:01 AM, Piotr Praczyk <piotr.prac...@gmail.com
> > >wrote:
> >
> > > just in addition to my previous post...
> > >
> > > You don't have to store the enceded files in a file system of course
> > since
> > > you can write your own InoutFormat which wil do this on the fly... the
> > > overhead should not be that big.
> > >
> > > Piotr
> > >
> > > 2009/5/14 Piotr Praczyk <piotr.prac...@gmail.com>
> > >
> > > > Hi
> > > >
> > > > If you want to read the files form HDFS and can not pass the binary
> > data,
> > > > you can do some encoding of it (base 64 for example, but you can
> think
> > > about
> > > > sth more efficient since the range of characters accprable in the
> input
> > > > string is wider than that used by BASE64). It should solve the
> problem
> > > until
> > > > some king of binary input is supported ( is it going to happen? ).
> > > >
> > > > Piotr
> > > >
> > > > 2009/5/14 openresearch <qiming...@openresearchinc.com>
> > > >
> > > >
> > > >> All,
> > > >>
> > > >> I have read some recommendation regarding image (binary input)
> > > processing
> > > >> using Hadoop-streaming which only accept text out-of-box for now.
> > > >> http://hadoop.apache.org/core/docs/current/streaming.html
> > > >> https://issues.apache.org/jira/browse/HADOOP-1722
> > > >> http://markmail.org/message/24woaqie2a6mrboc
> > > >>
> > > >> However, I have not got any straight answer.
> > > >>
> > > >> One recommendation is to put image data on HDFS, but we have to do
> > "hdf
> > > >> -get" for each file/dir and process it locally which is every
> > expensive.
> > > >>
> > > >> Another recommendation is to "...put them in a centralized place
> where
> > > all
> > > >> the hadoop nodes can access them (via .e.g, NFS mount)..."
> Obviously,
> > IO
> > > >> will becomes bottleneck and it defeat the purpose of distributed
> > > >> processing.
> > > >>
> > > >> I also notice some enhancement ticket is open for hadoop-core. Is it
> > > >> committed to any svn (0.21) branch? can somebody show me an example
> > how
> > > to
> > > >> take *.jpg files (from HDFS), and process files in a distributed
> > fashion
> > > >> using streaming?
> > > >>
> > > >> Many thanks
> > > >>
> > > >> -Qiming
> > > >> --
> > > >> View this message in context:
> > > >>
> > >
> >
> http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
> > > >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: hadoop streaming binary input / image processing

Reply via email to