Re: maprd vs mapreduce api

2011-08-05 Thread Stevens, Keith D.
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the 
identity function.  So you should be able to just do 

conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class);

without having to implement your own no-op classes.

I recommend reading the javadoc for differences between the old api and the new 
api, for example http://hadoop.apache.org/common/docs/r0.20.2/api/index.html 
indicates the different functionality of Mapper in the new api and it's dual 
use as the identity mapper.

Cheers,
--Keith

On Aug 5, 2011, at 1:15 PM, garpinc wrote:

 
 I was following this tutorial on version 0.19.1
 
 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html
 
 I however wanted to use the latest version of api 0.20.2
 
 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
 
 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
 public static class NOOPMapper 
  extends MapperText, IntWritable, Text, IntWritable{
 
 
   public void map(Text key, IntWritable value, Context context
   ) throws IOException, InterruptedException {
 
   context.write(key, value);
 
   }
 }
 
 public static class NOOPReducer 
  extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();
 
   public void reduce(Text key, IterableIntWritable values, 
  Context context
  ) throws IOException, InterruptedException {
 context.write(key, result);
   }
 }
 
 
 And then with code:
   Configuration conf = new Configuration();
   Job job = new Job(conf, testdriver);
 
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
 
   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);
 
   FileInputFormat.addInputPath(job, new Path(In));
   FileOutputFormat.setOutputPath(job, new Path(Out));
 
   job.setMapperClass(NOOPMapper.class);
   job.setReducerClass(NOOPReducer.class);
 
   job.waitForCompletion(true);
 
 
 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
   at TestDriver$NOOPMapper.map(TestDriver.java:1)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0
 
 
 
 Can anyone tell me what I need for this to work.
 
 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java 
 -- 
 View this message in context: 
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 



Re: maprd vs mapreduce api

2011-08-05 Thread Mohit Anchlia
On Fri, Aug 5, 2011 at 3:42 PM, Stevens, Keith D. steven...@llnl.gov wrote:
 The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the 
 identity function.  So you should be able to just do

 conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
 conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class);

 without having to implement your own no-op classes.

 I recommend reading the javadoc for differences between the old api and the 
 new api, for example 
 http://hadoop.apache.org/common/docs/r0.20.2/api/index.html indicates the 
 different functionality of Mapper in the new api and it's dual use as the 
 identity mapper.

Sorry for asking on this thread :) Does Definitive Guide 2 cover the new api?

 Cheers,
 --Keith

 On Aug 5, 2011, at 1:15 PM, garpinc wrote:


 I was following this tutorial on version 0.19.1

 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html

 I however wanted to use the latest version of api 0.20.2

 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
 public static class NOOPMapper
      extends MapperText, IntWritable, Text, IntWritable{


   public void map(Text key, IntWritable value, Context context
                   ) throws IOException, InterruptedException {

       context.write(key, value);

   }
 }

 public static class NOOPReducer
      extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();

   public void reduce(Text key, IterableIntWritable values,
                      Context context
                      ) throws IOException, InterruptedException {
     context.write(key, result);
   }
 }


 And then with code:
               Configuration conf = new Configuration();
               Job job = new Job(conf, testdriver);

               job.setOutputKeyClass(Text.class);
               job.setOutputValueClass(IntWritable.class);

               job.setInputFormatClass(TextInputFormat.class);
               job.setOutputFormatClass(TextOutputFormat.class);

               FileInputFormat.addInputPath(job, new Path(In));
               FileOutputFormat.setOutputPath(job, new Path(Out));

               job.setMapperClass(NOOPMapper.class);
               job.setReducerClass(NOOPReducer.class);

               job.waitForCompletion(true);


 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
       at TestDriver$NOOPMapper.map(TestDriver.java:1)
       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
       at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0



 Can anyone tell me what I need for this to work.

 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java
 --
 View this message in context: 
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: maprd vs mapreduce api

2011-08-01 Thread Roger Chen
Your reducer is writing IntWritable but your output format class is still
Text. Change one of those so they match the other.

On Mon, Aug 1, 2011 at 8:40 PM, garpinc garp...@hotmail.com wrote:


 I was following this tutorial on version 0.19.1

 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html

 I however wanted to use the latest version of api 0.20.2

 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
  public static class NOOPMapper
  extends MapperText, IntWritable, Text, IntWritable{


   public void map(Text key, IntWritable value, Context context
   ) throws IOException, InterruptedException {

   context.write(key, value);

   }
  }

  public static class NOOPReducer
  extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();

   public void reduce(Text key, IterableIntWritable values,
  Context context
  ) throws IOException, InterruptedException {
 context.write(key, result);
   }
  }


 And then with code:
Configuration conf = new Configuration();
Job job = new Job(conf, testdriver);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(In));
FileOutputFormat.setOutputPath(job, new Path(Out));

job.setMapperClass(NOOPMapper.class);
job.setReducerClass(NOOPReducer.class);

job.waitForCompletion(true);


 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
at TestDriver$NOOPMapper.map(TestDriver.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0



 Can anyone tell me what I need for this to work.

 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java
 --
 View this message in context:
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Roger Chen
UC Davis Genome Center