Hi Aravinth This is probably do-able with Hadoop Streaming.
Imagine you have copied a bunch of image files to HDFS and now you want to point them to say, an executable. Odds are that executable already exists with some command line options that would take amongst other things, the file path of the image you would like to process. Hadoop Streaming makes a number of environment variables available at runtime, for instance "map_input_file" which gives you the file name of the file being processed, and so forth. My guess is that there is also an environment variable that will give you the filepath in the local filesystem. You need to code that in plus add a -file parameter to specify your executable. If you are using Amazon's EMR, you will need to put your code and executable into an S3 bucket, then specify the bucket name to Hadoop Streaming. Good luck Daniel On 29 November 2010 22:49, Shrijeet Paliwal <[email protected]> wrote: > This gentleman here (see below) is doing a hadoop streaming magic and > seems to be playing with the image features in map reducy way. Its not > using hadoop's java api though, so no help there. > Still you can check and see if the articles gives you some clues, > http://techportal.ibuildings.com/2009/11/02/precision-color-searching-with-gmagick-and-amazon-elastic-mapreduce/ > > PS: Pardon if the motivation in the article is orthogonal to yours. > > -Shrijeet > > On Mon, Nov 29, 2010 at 2:13 PM, Aravinth Bheemaraj > <[email protected]> wrote: >> Michael, thanks a lot for your reply. >> >> I got to compare the images based on pixels. So is it possible to process >> the image based on Pixel values rather than XML records? >> >> I have read somewhere that the class "InputFormat" can be customized to >> handle images by extending "InputSplit" and "RecordReader". But I am unsure >> of the methods which are to be overridden so that I can access pixels of the >> image. Is there anyway you can help me with this? >> >> Regarding the note, I am reading in a directory with multiple image files. >> >> On Mon, Nov 29, 2010 at 4:08 PM, Michael Segel >> <[email protected]>wrote: >> >>> >>> Hi, >>> The short answer is yes you can process images in Hadoop. >>> Think of the image as a multi-line byte stream. >>> >>> As to an existing class, I don't believe that it exists, but shouldn't be >>> too difficult to cobble. >>> (If you can read in XML records for processing you should be able to read >>> in a file containing a series of images.) >>> >>> Note: I'm assuming that you're either reading in a directory w multiple >>> image files, or an image file w multiple images. Otherwise you probably >>> don't want to use Hadoop. >>> >>> >>> > Date: Mon, 29 Nov 2010 14:56:35 -0500 >>> > Subject: Image as input to M-R in Hadoop >>> > From: [email protected] >>> > To: [email protected] >>> > >>> > Hi, >>> > >>> > I am a beginner to Hadoop and I am looking for some help in implementing >>> the >>> > Mapper with an image as input. Is there any predefined Writable class for >>> > processing image? If so, how do I use it? >>> > >>> > Also I have read somewhere that compressed formats cannot be processed in >>> > Hadoop. If this is true, am I making any sense in saying that the JPEG >>> > images (which are also compressed format) cannot be processed by Hadoop? >>> > Please correct me if I have misunderstood this concept. >>> > >>> > Thanks, >>> > -- >>> > Aravinth >>> >>> >> >> >> >> -- >> Aravinth Bheemaraj >> University of Florida >> >
