Hi, Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the following
1. Compile the streaming jar files with your own custom classes and run on ec2 using this custom jar ( should work for 18.3 . Make sure you pick compatible streaming classes ) 2. Jar up your classes and specify them as -libjars option on command line, and specify the custom input and output formats as you have on your local machine ( should work for >19.0 ) I have never worked on EC2, so not sure if any easier solution exists. Amogh On 6/2/10 1:52 AM, "Mo Zhou" <[email protected]> wrote: Hi, I know it may not be suitable to be posted here since it relates to EC2 more than Hadoop. However I could not find a solution and hope some one here could kindly help me out. Here is my question. I created my own inputreader and outputformatter to split an input file while use hadoop streaming. They are tested in my local machine. Following is how I use them. bin/hadoop jar hadoop-0.20.2-streaming.jar \ -D mapred.map.tasks=4\ -D mapred.reduce.tasks=0\ -input HumanSeqs.4\ -output output\ -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\ -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\ -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat" I want to deploy the job to elastic mapreduce. I first create a streaming job. I specify input and output in S3, mapper, and reducer. However I could not find the place where I can specify -inputreader and -inputformat. So my questions are 1) how I can upload the class files to be used as inputreader and inputformat to elastic mapreduce? 2) how I specify to use them in the streaming? Any reply is appreciated. Thanks for your time! -- Thanks, Mo
