Hi Harsh,

Thanks for getting back so quickly.

The full source code is attached as there's nothing sensitive in it.
Coding wouldn't be my strong point so apologies in advance if it looks a
mess.

Thanks

On Sat, Feb 9, 2013 at 6:09 PM, Harsh J <ha...@cloudera.com> wrote:

> Whatever "csatAnalysis.MapClass" the compiler picked up, it appears to
> not be extending the org.apache.hadoop.mapreduce.Mapper class. From
> your snippets it appears that you have it all defined properly though.
> A common issue here has also been that people accidentally import the
> wrong API (mapred.*) but that doesn't seem to be the case either.
>
> Can you post your full compilable source somewhere? Remove any logic
> you don't want to share - we'd mostly be interested in the framework
> definition parts alone.
>
> On Sat, Feb 9, 2013 at 11:27 PM, Ronan Lehane <ronan.leh...@gmail.com>
> wrote:
> > Hi All,
> >
> > I hope this is the right forum for this type of question so my apologies
> if
> > not.
> >
> > I'm looking to write a map reduce program which is giving me the
> following
> > compilation error:
> > The method setMapperClass(Class<? extends Mapper>) in the type Job is not
> > applicable for the arguments (Class<csatAnalysis.MapClass>)
> >
> > The components involved are:
> >
> > 1. Setting the Mapper
> >         //Set the Mapper for the job. Calls MapClass.class
> >         job.setMapperClass(MapClass.class);
> >
> > 2. Setting the inputFormat to TextInputFormat
> >         //An InputFormat for plain text files. Files are broken into
> lines.
> >         //Either linefeed or carriage-return are used to signal end of
> > line.
> >         //Keys are the position in the file, and values are the line of
> > text..
> >         job.setInputFormatClass(TextInputFormat.class);
> >
> > 3. Taking the data into the mapper and processing it
> >     public static class MapClass extends Mapper<LongWritable, Text, Text,
> > VectorWritable> {
> >         public void map (LongWritable key, Text value,Context context)
> > throws IOException, InterruptedException {
> >
> > Would anyone have any clues as to what would be wrong with the arguements
> > being passed to the Mapper?
> >
> > Any help would be appreciated,
> >
> > Thanks.
>
>
>
> --
> Harsh J
>
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.FileInputStream;
import java.util.*;
import java.io.File;

import org.apache.mahout.vectorizer.encoders.*;
import org.apache.mahout.classifier.sgd.*;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.MurmurHash;
import org.apache.mahout.math.VectorWritable;
import org.apache.mahout.common.HadoopUtil;

import org.apache.hadoop.util.hash.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.fs.Path;

import com.google.common.base.Splitter;
import com.google.common.base.CharMatcher;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;

import org.apache.commons.logging.*;

public class csatAnalysis {
        
        public static class MapClass extends Mapper<LongWritable, Text, Text, 
VectorWritable> {
                
                CsvRecordFactory csv;           
                Map<String, String> typeMap;
                
                public void map (LongWritable key, Text value, Context context) 
throws IOException, InterruptedException {
                        
                        //Create a map which will be used to create the 
csvRecordFactory
                        typeMap = Maps.newHashMap();
                        //Populate the HashMap
                        Iterator<String> iTypes = typeList.iterator();
                        String lastType = null;
                        for (Object x : predictorList) {
                                // type list can be short .. we just repeat 
last spec
                                if (iTypes.hasNext()) {
                                        lastType = iTypes.next();
                                }
                                typeMap.put(x.toString(), lastType);
                        }               
                        
                        csv = new CsvRecordFactory("Answer5", "SR_Number", 
typeMap);
                        csv.firstLine(fieldNames);
                        csv.maxTargetValue(2);
                        
                        Vector input = new RandomAccessSparseVector(50000);
                        String line = value.toString();
                        int record = csv.processLine(strLine, input);
                        String keyValue = csv.getIdString(line);
                        context.write(new Text(keyValue), new 
VectorWritable(input));   
                }
        }
        
        public static class Reduce extends Reducer<Text, VectorWritable, Text, 
VectorWritable> {
                @Override
                public void reduce(Text key, Iterable<VectorWritable> values, 
Context context) throws IOException, InterruptedException {
                        for (VectorWritable value : values) {
                                context.write(key, value);
                        }
                }
        }
        
        
        public static void main(String[] args) throws Exception {
                
                Splitter COMMA = Splitter.on(',').trimResults(CharMatcher.is(' 
'));
                
                //variables for the hadoop configuration
                String targetVariable = "Answer5";
                String keyVariable = "SR_Number";
                String fieldNames = 
"SR_Number,Case_Owner,SupportCenter,Severity,BusinessDaysToClose,Problem_Category,Entitlement_Desc,Customer_Country,Contact_Email,FirstResponseMet,EscalationRequests,PromoteRequests,Answer5";
                //Create two lists. One of the predictors and one for the types.
                String predictors = 
"SR_Number,Case_Owner,SupportCenter,Severity,BusinessDaysToClose,Problem_Category,Entitlement_Desc,Customer_Country,Contact_Email,FirstResponseMet,EscalationRequests,PromoteRequests";
                String types = 
"continuous,text,word,numeric,numeric,text,text,text,word,numeric,numeric,numeric,numeric";
 
                //Split the strings into Lists of substrings.
                List<String> typeList = Lists.newArrayList(COMMA.split(types));
                List<String> predictorList = 
Lists.newArrayList(COMMA.split(predictors));
                List<String> targetCategories = Lists.newArrayList("0","1");
                String inputFile = new 
String("/home/training/workspace/Ronan/2012CSATs-noHeader-BinaryTarget.csv");
                String outputPath = new 
String("/home/training/workspace/Ronan/output");
                
                //Test to confirm that the List Variables have been 
successfully added. 
                //for(String s : predictorList){
                //    System.out.println(s);
                //}             
                
                //Create a new configuration object for use with Hadoop.
                //set(String name, String value) 
                //Set the "value" of the "name" property.
                Configuration conf = new Configuration();
                conf.set("targetVariable", targetVariable);
                conf.set("keyVariable", keyVariable);
                conf.set("csvHeader", fieldNames);
                conf.set("predictorListString", predictors);
                conf.set("typeListString", types);
                
                //Create a mapreduce job
                //The job submitter's view of the Job. 
                //It allows the user to configure the job, submit it, control 
its execution, and query the state.
                Job job = new Job(conf);
                job.setJobName("csatAnalysis");
                
                //Set the key class for the job output data.
                job.setOutputKeyClass(Text.class);
                
                //Set the value class for job outputs.
                job.setOutputValueClass(VectorWritable.class);
                
                //Set input and output path
                FileInputFormat.setInputPaths(job, new Path(inputFile));
                Path outPath = new Path(outputPath);
                FileOutputFormat.setOutputPath(job, outPath);
                
                //Set the Mapper for the job. Calls MapClass.class
                job.setMapperClass(MapClass.class);
                                
                //Set the Reducer for the job. Calls Csv2SeqReducer.class
                job.setReducerClass(Reduce.class);
                
                //An InputFormat for plain text files. Files are broken into 
lines. 
                //Either linefeed or carriage-return are used to signal end of 
line. 
                //Keys are the position in the file, and values are the line of 
text.. 
                job.setInputFormatClass(TextInputFormat.class);
                
                //An OutputFormat that writes SequenceFiles. 
                job.setOutputFormatClass(SequenceFileOutputFormat.class);
                
                //Set the Jar by finding where a given class came from.
                job.setJarByClass(csatAnalysis.class);
                
                HadoopUtil.delete(conf, outPath);
                
                //Submit the job to the cluster and wait for it to finish.
                job.waitForCompletion(true);            

        }
}

Reply via email to