Yeah I solved the problem. I had to move the configuration options from the main function of my code to the workflow.xml file. The reason for confusion was the "Quick Start" Documentation. I strongly feel the documentation needs to be clear about this.
On Wed, May 23, 2012 at 10:30 PM, Alejandro Abdelnur <[email protected]>wrote: > Hi Anshul, > > It looks like you are setting the key/value classes for the > input/output/intermediate-output and Hadoop is trying to use the default > ones. > > thx > > On Tue, May 22, 2012 at 11:21 PM, Anshul Singhle > <[email protected]>wrote: > > > Hi all, > > I tried running the wordcount example on oozie and i'm getting the > > following exception on my hadoop log: > > ERROR org.apache.hadoop.security.UserGroupInformation: > > PriviledgedActionException as:cloudera (auth:SIMPLE) > > cause:java.io.IOException: Type mismatch in key from map: expected > > org.apache.hadoop.io.LongWritable, recieved org.apache.hadoop.io.Text > > 2012-05-22 18:58:27,832 WARN org.apache.hadoop.mapred.Child: Error > running > > child > > java.io.IOException: Type mismatch in key from map: expected > > org.apache.hadoop.io.LongWritable, recieved org.apache.hadoop.io.Text > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871) > > at > > > > > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:499) > > at WordCount$Map.map(WordCount.java:22) > > at WordCount$Map.map(WordCount.java:12) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > > at org.apache.hadoop.mapred.Child.main(Child.java:264) > > > > WordCount.java is copied verbatim from the apache mapreduce tutorial. I > > removed the combiner that solved the problem for some people over at > > stackoverflow, but i'm still getting the error. I ran the same jar in > > hadoop and got the correct output with no error. Posting WordCount.java > for > > completeness. > > > > import java.io.IOException; > > import java.util.*; > > > > import org.apache.hadoop.fs.Path; > > import org.apache.hadoop.conf.*; > > import org.apache.hadoop.io.*; > > import org.apache.hadoop.mapred.*; > > import org.apache.hadoop.util.*; > > > > public class WordCount { > > > > public static class Map extends MapReduceBase implements > > Mapper<LongWritable, Text, Text, IntWritable> { > > private final static IntWritable one = new IntWritable(1); > > private Text word = new Text(); > > > > public void map(LongWritable key, Text value, > > OutputCollector<Text, IntWritable> output, Reporter reporter) throws > > IOException { > > String line = value.toString(); > > System.out.println("here"+"\t"+ line + "\t" + key); > > StringTokenizer tokenizer = new StringTokenizer(line); > > while (tokenizer.hasMoreTokens()) { > > word.set(tokenizer.nextToken()); > > output.collect(word, one); > > } > > } > > } > > > > public static class Reduce extends MapReduceBase implements > > Reducer<Text, IntWritable, Text, IntWritable> { > > public void reduce(Text key, Iterator<IntWritable> values, > > OutputCollector<Text, IntWritable> output, Reporter reporter) throws > > IOException { > > int sum = 0; > > while (values.hasNext()) { > > sum += values.next().get(); > > } > > output.collect(key, new IntWritable(sum)); > > } > > } > > > > public static void main(String[] args) throws Exception { > > JobConf conf = new JobConf(WordCount.class); > > //conf.setJobName("wordcount"); > > //conf.setJar("wordcount.jar"); > > conf.setOutputKeyClass(Text.class); > > conf.setOutputValueClass(IntWritable.class); > > > > conf.setMapperClass(Map.class); > > conf.setReducerClass(Reduce.class); > > > > conf.setInputFormat(TextInputFormat.class); > > conf.setOutputFormat(TextOutputFormat.class); > > > > //FileInputFormat.setInputPaths(conf, new Path(args[0])); > > //FileOutputFormat.setOutputPath(conf, new Path(args[1])); > > > > JobClient.runJob(conf); > > } > > } > > > > Note that if I change mapper to map and reducer to reduce , I don't get > > the error but I get wrong output. I checked the input given to the mapper > > and that is apparently empty. And here is my workflow.xml. : > > > > <workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1"> > > <start to='wordcount'/> > > <action name='wordcount'> > > <map-reduce> > > <job-tracker>${jobTracker}</job-tracker> > > <name-node>${nameNode}</name-node> > > <prepare> > > </prepare> > > <configuration> > > > > <property> > > <name>mapred.job.queue.name</name> > > <value>${queueName}</value> > > </property> > > <property> > > <name>mapred.mapper.class</name> > > <value>WordCount$Map</value> > > </property> > > <property> > > <name>mapred.reducer.class</name> > > <value>WordCount$Reduce</value> > > </property> > > <property> > > <name>mapred.input.dir</name> > > <value>${inputDir}</value> > > </property> > > <property> > > <name>mapred.output.dir</name> > > <value>${outputDir}</value> > > </property> > > </configuration> > > </map-reduce> > > <ok to='end'/> > > <error to='end'/> > > </action> > > <kill name='kill'> > > <message>${wf:errorCode("wordcount")}</message> > > </kill> > > <end name='end'/> > > </workflow-app> > > > > I'm running oozie and hadoop on a single VM taken from cloudera. > > oozie version: > > Oozie client build version: 2.3.2-cdh3u4 > > hadoop version: > > Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u4 -r > > 214dd731e3bdb687cb55988d3f47dd9e248c5690 > > Compiled by root on Mon May 7 14:03:02 PDT 2012 > > From source with checksum a60c9795e41a3248b212344fb131c12c > > > > > > -- > Alejandro >
