Re: maprd vs mapreduce api
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the identity function. So you should be able to just do conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class); conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class); without having to implement your own no-op classes. I recommend reading the javadoc for differences between the old api and the new api, for example http://hadoop.apache.org/common/docs/r0.20.2/api/index.html indicates the different functionality of Mapper in the new api and it's dual use as the identity mapper. Cheers, --Keith On Aug 5, 2011, at 1:15 PM, garpinc wrote: I was following this tutorial on version 0.19.1 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html I however wanted to use the latest version of api 0.20.2 The original code in tutorial had following lines conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); both Identity classes are deprecated.. So seemed the solution was to create mapper and reducer as follows: public static class NOOPMapper extends MapperText, IntWritable, Text, IntWritable{ public void map(Text key, IntWritable value, Context context ) throws IOException, InterruptedException { context.write(key, value); } } public static class NOOPReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { context.write(key, result); } } And then with code: Configuration conf = new Configuration(); Job job = new Job(conf, testdriver); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(In)); FileOutputFormat.setOutputPath(job, new Path(Out)); job.setMapperClass(NOOPMapper.class); job.setReducerClass(NOOPReducer.class); job.waitForCompletion(true); However I get this message java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at TestDriver$NOOPMapper.map(TestDriver.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 11/08/01 16:41:01 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0 Can anyone tell me what I need for this to work. Attached is full code.. http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java -- View this message in context: http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: maprd vs mapreduce api
On Fri, Aug 5, 2011 at 3:42 PM, Stevens, Keith D. steven...@llnl.gov wrote: The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the identity function. So you should be able to just do conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class); conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class); without having to implement your own no-op classes. I recommend reading the javadoc for differences between the old api and the new api, for example http://hadoop.apache.org/common/docs/r0.20.2/api/index.html indicates the different functionality of Mapper in the new api and it's dual use as the identity mapper. Sorry for asking on this thread :) Does Definitive Guide 2 cover the new api? Cheers, --Keith On Aug 5, 2011, at 1:15 PM, garpinc wrote: I was following this tutorial on version 0.19.1 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html I however wanted to use the latest version of api 0.20.2 The original code in tutorial had following lines conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); both Identity classes are deprecated.. So seemed the solution was to create mapper and reducer as follows: public static class NOOPMapper extends MapperText, IntWritable, Text, IntWritable{ public void map(Text key, IntWritable value, Context context ) throws IOException, InterruptedException { context.write(key, value); } } public static class NOOPReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { context.write(key, result); } } And then with code: Configuration conf = new Configuration(); Job job = new Job(conf, testdriver); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(In)); FileOutputFormat.setOutputPath(job, new Path(Out)); job.setMapperClass(NOOPMapper.class); job.setReducerClass(NOOPReducer.class); job.waitForCompletion(true); However I get this message java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at TestDriver$NOOPMapper.map(TestDriver.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 11/08/01 16:41:01 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0 Can anyone tell me what I need for this to work. Attached is full code.. http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java -- View this message in context: http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: maprd vs mapreduce api
Your reducer is writing IntWritable but your output format class is still Text. Change one of those so they match the other. On Mon, Aug 1, 2011 at 8:40 PM, garpinc garp...@hotmail.com wrote: I was following this tutorial on version 0.19.1 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html I however wanted to use the latest version of api 0.20.2 The original code in tutorial had following lines conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); both Identity classes are deprecated.. So seemed the solution was to create mapper and reducer as follows: public static class NOOPMapper extends MapperText, IntWritable, Text, IntWritable{ public void map(Text key, IntWritable value, Context context ) throws IOException, InterruptedException { context.write(key, value); } } public static class NOOPReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { context.write(key, result); } } And then with code: Configuration conf = new Configuration(); Job job = new Job(conf, testdriver); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(In)); FileOutputFormat.setOutputPath(job, new Path(Out)); job.setMapperClass(NOOPMapper.class); job.setReducerClass(NOOPReducer.class); job.waitForCompletion(true); However I get this message java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at TestDriver$NOOPMapper.map(TestDriver.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 11/08/01 16:41:01 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0 Can anyone tell me what I need for this to work. Attached is full code.. http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java -- View this message in context: http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Roger Chen UC Davis Genome Center