Hello every, I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two text file,sales.txt and account.txt,like follows: sales.txt 001 35.99 2012-03-15 002 12.49 2004-07-02 004 13.42 2005-12-20 003 499.99 2010-12-20 001 78.95 2012-04-02 002 21.99 2006-11-30 002 93.45 2008-09-10 001 9.99 2012-05-17
account.txt 001 John Allen Standard 2012-03-15 002 Abigail Smith Premium 2004-07-13 003 April Stevens Standard 2010-12-20 004 Nasser Hafez Premium 2001-04-23 ReduceJoin.java is follows: import java.io.* ; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ; public class ReduceJoin { public static class SalesRecordMapper extends Mapper<Object, Text, Text, Text>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String record = value.toString() ; String[] parts = record.split("\t") ; context.write(new Text(parts[0]), new Text("sales\t"+parts[1])) ; } } public static class AccountRecordMapper extends Mapper<Object, Text, Text, Text>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String record = value.toString() ; String[] parts = record.split("\t") ; context.write(new Text(parts[0]), new Text("accounts\t"+parts[1])) ; } } public static class ReduceJoinReducer extends Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterable<Text> values, Context context ) throws IOException, InterruptedException { String name = "" ; double total = 0.0 ; int count = 0 ; for(Text t: values) { String parts[] = t.toString().split("\t") ; if (parts[0].equals("sales")) { count++ ; total+= Float.parseFloat(parts[1]) ; } else if (parts[0].equals("accounts")) { name = parts[1] ; } } String str = String.format("%d\t%f", count, total) ; context.write(new Text(name), new Text(str)) ; } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "Reduce-side join"); job.setJarByClass(ReduceJoin.class); job.setReducerClass(ReduceJoinReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, SalesRecordMapper.class) ; MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, AccountRecordMapper.class) ; // FileOutputFormat.setOutputPath(job, new Path(args[2])); Path outputPath = new Path(args[2]); FileOutputFormat.setOutputPath(job, outputPath); outputPath.getFileSystem(conf).delete(outputPath); System.exit(job.waitForCompletion(true) ? 0 : 1); } } I create join.jar and run it $ hadoop jar join.jarReduceJoin sales accounts outputs $ hadoop fs -cat /user/garry/outputs/part-r-00000 John Allen 3 124.929998 Abigail Smith 3 127.929996 April Stevens 1 499.989990 Nasser Hafez 1 13.420000 I know map method put these text file into map,like follows,right? <001, 35.99> <001, 35.99> <002, 12.49> <004, 13.42> <003, 499.99> <001 ,78.95> <002, 21.99> <002, 93.45> <001, 9.99> <001, John Allen> <002, Abigail Smith> <003, April Stevens> <004, Nasser Hafez> But I don't under stand reduce method,how it produce following result,any one counld give the detail steps to produce following result? Thanks in advance John Allen 3 124.929998 Abigail Smith 3 127.929996 April Stevens 1 499.989990 Nasser Hafez 1 13.420000 --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. ---------------------------------------------------------------------------------------------------