Re: hadoop streaming on Amazon EC2

Amogh Vasekar Wed, 02 Jun 2010 01:57:33 -0700

Hi,
Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the 
following


1. Compile the streaming jar files with your own custom classes and run on ec2 
using this custom jar ( should work for 18.3 . Make sure you pick compatible 
streaming classes )

2. Jar up your classes and specify them as -libjars option on command line, and 
specify the custom input and output formats as you have on your local machine ( 
should work for >19.0 )

I have never worked on EC2, so not sure if any easier solution exists.


Amogh


On 6/2/10 1:52 AM, "Mo Zhou" <[email protected]> wrote:

Hi,

I know it may not be suitable to be posted here since it relates to
EC2 more than Hadoop. However I could not find a solution and hope
some one here could kindly help me out. Here is my question.

I created my own inputreader and outputformatter to split an input
file while use hadoop streaming. They are tested in my local machine.
Following is how I use them.

bin/hadoop  jar hadoop-0.20.2-streaming.jar \
   -D mapred.map.tasks=4\
   -D mapred.reduce.tasks=0\
   -input HumanSeqs.4\
   -output output\
   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"


I want to deploy the job to elastic mapreduce. I first create a
streaming job. I specify input and output in S3, mapper,
and reducer. However I could not find the place where I can specify
-inputreader and -inputformat.

So my questions are
1) how I can upload the class files to be used as inputreader and
inputformat to elastic mapreduce?
2) how I specify to use them in the streaming?

Any reply is appreciated. Thanks for your time!

--
Thanks,
Mo

Re: hadoop streaming on Amazon EC2

Reply via email to