My apologies Piotr, I was referring to the streaming case and then pulling the file out of a shared file systems, not using an input split that contains the image data as you suggest.
On Thu, May 14, 2009 at 11:50 PM, Piotr Praczyk <piotr.prac...@gmail.com>wrote: > Depends what API do you use. When writing an InputSplit implementation, it > is possible to specify on which nodes does the data reside. I am new to > Hadoop, but as far as I know, doing this > should enable the support for data locality. Moreover, implementing a > subclass of TextInputFormat and adding some encoding on the fly should not > impact any locality properties. > > > Piotr > > > 2009/5/15 jason hadoop <jason.had...@gmail.com> > > > A downside of this approach is that you will not likely have data > locality > > for the data on shared file systems, compared with data coming from an > > input > > split. > > That being said, > > from your script, *hadoop dfs -get FILE -* will write the file to > standard > > out. > > > > On Thu, May 14, 2009 at 10:01 AM, Piotr Praczyk <piotr.prac...@gmail.com > > >wrote: > > > > > just in addition to my previous post... > > > > > > You don't have to store the enceded files in a file system of course > > since > > > you can write your own InoutFormat which wil do this on the fly... the > > > overhead should not be that big. > > > > > > Piotr > > > > > > 2009/5/14 Piotr Praczyk <piotr.prac...@gmail.com> > > > > > > > Hi > > > > > > > > If you want to read the files form HDFS and can not pass the binary > > data, > > > > you can do some encoding of it (base 64 for example, but you can > think > > > about > > > > sth more efficient since the range of characters accprable in the > input > > > > string is wider than that used by BASE64). It should solve the > problem > > > until > > > > some king of binary input is supported ( is it going to happen? ). > > > > > > > > Piotr > > > > > > > > 2009/5/14 openresearch <qiming...@openresearchinc.com> > > > > > > > > > > > >> All, > > > >> > > > >> I have read some recommendation regarding image (binary input) > > > processing > > > >> using Hadoop-streaming which only accept text out-of-box for now. > > > >> http://hadoop.apache.org/core/docs/current/streaming.html > > > >> https://issues.apache.org/jira/browse/HADOOP-1722 > > > >> http://markmail.org/message/24woaqie2a6mrboc > > > >> > > > >> However, I have not got any straight answer. > > > >> > > > >> One recommendation is to put image data on HDFS, but we have to do > > "hdf > > > >> -get" for each file/dir and process it locally which is every > > expensive. > > > >> > > > >> Another recommendation is to "...put them in a centralized place > where > > > all > > > >> the hadoop nodes can access them (via .e.g, NFS mount)..." > Obviously, > > IO > > > >> will becomes bottleneck and it defeat the purpose of distributed > > > >> processing. > > > >> > > > >> I also notice some enhancement ticket is open for hadoop-core. Is it > > > >> committed to any svn (0.21) branch? can somebody show me an example > > how > > > to > > > >> take *.jpg files (from HDFS), and process files in a distributed > > fashion > > > >> using streaming? > > > >> > > > >> Many thanks > > > >> > > > >> -Qiming > > > >> -- > > > >> View this message in context: > > > >> > > > > > > http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html > > > >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > >> > > > >> > > > > > > > > > > > > > > > -- > > Alpha Chapters of my book on Hadoop are available > > http://www.apress.com/book/view/9781430219422 > > www.prohadoopbook.com a community for Hadoop Professionals > > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals