Re: why reduce doesn't work by HBase?

2012-11-16 Thread yonghu
Thanks for your suggestions

On Thu, Nov 15, 2012 at 9:56 PM, Bertrand Dechoux decho...@gmail.com wrote:
 Hi Yong,

 A few ways to understand a problem (with Hadoop) :
 1) Write tests, using MRUnit for example (http://mrunit.apache.org/)
 2) Use @Override to make sure you are indeed overriding a method
 3) Use a break point while debugging

 The answer to your current problem : o.a.h.mapreduce.Reducer method has no
 Iterator parameter but it does have a Iterable parameter...

 Regards

 Bertrand

 PS : It is absolutely not related to HBase.

 On Thu, Nov 15, 2012 at 8:42 PM, yonghu yongyong...@gmail.com wrote:

 Dear all,

 I use HBase as data source and hdfs as data sink. I wrote the program
 which calculate the versions for each cell as follows:

 public class ReturnMore { //only Map side has no problem, has already
 returned 3 versions

public static class Map extends TableMapperText, Text{
   private Text outkey = new Text();
   private Text outval = new Text();
   public void map(ImmutableBytesWritable key, Result values, Context
 context){
  KeyValue[] kvs = values.raw();
  String row_key;
  String col_fam;
  String col;
  String val;
  String finalKey;
  String finalValue;
  for(KeyValue kv : kvs){
 row_key = Bytes.toString(kv.getRow());
 col_fam = Bytes.toString(kv.getFamily());
 col = Bytes.toString(kv.getQualifier());
 val = Bytes.toString(kv.getValue());
 finalKey = row_key + / + col_fam + / + col;
 finalValue = new Long(kv.getTimestamp()).toString() + / +
 val;
 outkey.set(finalKey);
 outval.set(finalValue);
 try {
context.write(outkey, outval);
 } catch (IOException e) {
e.printStackTrace();
 } catch (InterruptedException e) {
e.printStackTrace();
 }
  }
   }
}

public static class Reduce extends ReducerText, Text, Text,
 IntWritable{ //the desired output is key + 3
   private IntWritable occ = new IntWritable();
   public void reduce(Text key, IteratorText values, Context
 context){
  int i = 0;
  while(values.hasNext()){
 i++;
  }
  occ.set(i);
  try {
 context.write(key, occ);
  } catch (IOException e) {
 e.printStackTrace();
  } catch (InterruptedException e) {
 e.printStackTrace();
  }
   }
}
public static void main(String[] args) throws Exception{
   Configuration conf = HBaseConfiguration.create();
   Job job = new Job(conf);
   job.setJarByClass(ReturnMore.class);
   Scan scan = new Scan();
   scan.setMaxVersions();
   job.setReducerClass(Reduce.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
   job.setNumReduceTasks(1);
   TableMapReduceUtil.initTableMapperJob(test, scan, Map.class,
 Text.class, Text.class, job);
   FileSystem file = FileSystem.get(conf);
   Path path = new Path(/hans/test/);
   if(file.exists(path)){
  file.delete(path,true);
   }
   FileOutputFormat.setOutputPath(job, path);
   System.exit(job.waitForCompletion(true)?0:1);
}
 }

 I tested this code in both hbase 0.92.1 and 0.94. However, if I run
 this code, it always outputs the content for each cell not as the
 output as I defined in reduce function (key + occurrence for each
 cell). Can anyone give me advices? By the way, I run it on
 pseudo-mode.

 regards!

 Yong




 --
 Bertrand Dechoux


Re: why reduce doesn't work by HBase?

2012-11-15 Thread Bertrand Dechoux
Hi Yong,

A few ways to understand a problem (with Hadoop) :
1) Write tests, using MRUnit for example (http://mrunit.apache.org/)
2) Use @Override to make sure you are indeed overriding a method
3) Use a break point while debugging

The answer to your current problem : o.a.h.mapreduce.Reducer method has no
Iterator parameter but it does have a Iterable parameter...

Regards

Bertrand

PS : It is absolutely not related to HBase.

On Thu, Nov 15, 2012 at 8:42 PM, yonghu yongyong...@gmail.com wrote:

 Dear all,

 I use HBase as data source and hdfs as data sink. I wrote the program
 which calculate the versions for each cell as follows:

 public class ReturnMore { //only Map side has no problem, has already
 returned 3 versions

public static class Map extends TableMapperText, Text{
   private Text outkey = new Text();
   private Text outval = new Text();
   public void map(ImmutableBytesWritable key, Result values, Context
 context){
  KeyValue[] kvs = values.raw();
  String row_key;
  String col_fam;
  String col;
  String val;
  String finalKey;
  String finalValue;
  for(KeyValue kv : kvs){
 row_key = Bytes.toString(kv.getRow());
 col_fam = Bytes.toString(kv.getFamily());
 col = Bytes.toString(kv.getQualifier());
 val = Bytes.toString(kv.getValue());
 finalKey = row_key + / + col_fam + / + col;
 finalValue = new Long(kv.getTimestamp()).toString() + / +
 val;
 outkey.set(finalKey);
 outval.set(finalValue);
 try {
context.write(outkey, outval);
 } catch (IOException e) {
e.printStackTrace();
 } catch (InterruptedException e) {
e.printStackTrace();
 }
  }
   }
}

public static class Reduce extends ReducerText, Text, Text,
 IntWritable{ //the desired output is key + 3
   private IntWritable occ = new IntWritable();
   public void reduce(Text key, IteratorText values, Context
 context){
  int i = 0;
  while(values.hasNext()){
 i++;
  }
  occ.set(i);
  try {
 context.write(key, occ);
  } catch (IOException e) {
 e.printStackTrace();
  } catch (InterruptedException e) {
 e.printStackTrace();
  }
   }
}
public static void main(String[] args) throws Exception{
   Configuration conf = HBaseConfiguration.create();
   Job job = new Job(conf);
   job.setJarByClass(ReturnMore.class);
   Scan scan = new Scan();
   scan.setMaxVersions();
   job.setReducerClass(Reduce.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
   job.setNumReduceTasks(1);
   TableMapReduceUtil.initTableMapperJob(test, scan, Map.class,
 Text.class, Text.class, job);
   FileSystem file = FileSystem.get(conf);
   Path path = new Path(/hans/test/);
   if(file.exists(path)){
  file.delete(path,true);
   }
   FileOutputFormat.setOutputPath(job, path);
   System.exit(job.waitForCompletion(true)?0:1);
}
 }

 I tested this code in both hbase 0.92.1 and 0.94. However, if I run
 this code, it always outputs the content for each cell not as the
 output as I defined in reduce function (key + occurrence for each
 cell). Can anyone give me advices? By the way, I run it on
 pseudo-mode.

 regards!

 Yong




-- 
Bertrand Dechoux