Moving to u...@hadoop.apache.org. If you have a question about this, please reply to user mailing list instead of mapreduce-dev@.
Thanks, Akira (2014/02/17 10:06), Akira AJISAKA wrote: >> I know map method put these text file into map,like follows,right? >> <001, 35.99> >> <001, 35.99> >> <002, 12.49> >> <004, 13.42> >> <003, 499.99> >> <001 ,78.95> >> <002, 21.99> >> <002, 93.45> >> <001, 9.99> >> <001, John Allen> >> <002, Abigail Smith> >> <003, April Stevens> >> <004, Nasser Hafez> > > Followings outputs are the correct. > > <001,sales 35.99> > <002,sales 12.49> > <004,sales 13.42> > <003,sales 499.99> > <001,sales 78.95> > <002,sales 21.99> > <002,sales 93.45> > <001,sales 9.99> > <001,accounts John Allen> > <002,accounts Abigail Smith> > <003,accounts April Stevens> > <004,accounts Nasser Hafez> > > The outputs are grouped and sorted by keys, and reducers process each > groups. The inputs of the reduce method are as follows: > > <key: 001, > values: {sales 35.99, sales 78.95, sales 9.99, accounts John Allen}> > <key: 002, > values: {sales 12.49, sales 21.99, sales 93.45, accounts Abigail Smith}> > <key: 003, > values: {sales 499.99, accounts April Stevens}> > <key: 004, > values: {sales 13.42, accounts Nasser Hafez}> > > Regards, > Akira > > (2014/02/17 1:14), EdwardKing wrote: >> Hello every, >> I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two >> text file,sales.txt and account.txt,like follows: >> sales.txt >> 001 35.99 2012-03-15 >> 002 12.49 2004-07-02 >> 004 13.42 2005-12-20 >> 003 499.99 2010-12-20 >> 001 78.95 2012-04-02 >> 002 21.99 2006-11-30 >> 002 93.45 2008-09-10 >> 001 9.99 2012-05-17 >> >> account.txt >> 001 John Allen Standard 2012-03-15 >> 002 Abigail Smith Premium 2004-07-13 >> 003 April Stevens Standard 2010-12-20 >> 004 Nasser Hafez Premium 2001-04-23 >> >> ReduceJoin.java is follows: >> import java.io.* ; >> import org.apache.hadoop.conf.Configuration; >> import org.apache.hadoop.fs.Path; >> import org.apache.hadoop.io.Text; >> import org.apache.hadoop.io.Text; >> import org.apache.hadoop.mapreduce.Job; >> import org.apache.hadoop.mapreduce.Mapper; >> import org.apache.hadoop.mapreduce.Reducer; >> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; >> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; >> import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ; >> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ; >> >> public class ReduceJoin >> { >> >> public static class SalesRecordMapper >> extends Mapper<Object, Text, Text, Text>{ >> >> public void map(Object key, Text value, Context context >> ) throws IOException, InterruptedException >> { >> String record = value.toString() ; >> String[] parts = record.split("\t") ; >> >> context.write(new Text(parts[0]), new >> Text("sales\t"+parts[1])) ; >> } >> } >> >> public static class AccountRecordMapper >> extends Mapper<Object, Text, Text, Text>{ >> >> public void map(Object key, Text value, Context context >> ) throws IOException, InterruptedException >> { >> String record = value.toString() ; >> String[] parts = record.split("\t") ; >> >> context.write(new Text(parts[0]), new >> Text("accounts\t"+parts[1])) ; >> } >> } >> >> public static class ReduceJoinReducer >> extends Reducer<Text, Text, Text, Text> >> { >> >> public void reduce(Text key, Iterable<Text> values, >> Context context >> ) throws IOException, InterruptedException >> { >> String name = "" ; >> double total = 0.0 ; >> int count = 0 ; >> >> for(Text t: values) >> { >> String parts[] = t.toString().split("\t") ; >> >> if (parts[0].equals("sales")) >> { >> count++ ; >> total+= Float.parseFloat(parts[1]) ; >> } >> else if (parts[0].equals("accounts")) >> { >> name = parts[1] ; >> } >> } >> >> String str = String.format("%d\t%f", count, total) ; >> context.write(new Text(name), new Text(str)) ; >> } >> } >> >> public static void main(String[] args) throws Exception { >> Configuration conf = new Configuration(); >> Job job = new Job(conf, "Reduce-side join"); >> job.setJarByClass(ReduceJoin.class); >> job.setReducerClass(ReduceJoinReducer.class); >> job.setOutputKeyClass(Text.class); >> job.setOutputValueClass(Text.class); >> MultipleInputs.addInputPath(job, new Path(args[0]), >> TextInputFormat.class, SalesRecordMapper.class) ; >> MultipleInputs.addInputPath(job, new Path(args[1]), >> TextInputFormat.class, AccountRecordMapper.class) ; >> // FileOutputFormat.setOutputPath(job, new Path(args[2])); >> Path outputPath = new Path(args[2]); >> FileOutputFormat.setOutputPath(job, outputPath); >> outputPath.getFileSystem(conf).delete(outputPath); >> >> System.exit(job.waitForCompletion(true) ? 0 : 1); >> } >> } >> >> I create join.jar and run it >> $ hadoop jar join.jarReduceJoin sales accounts outputs >> $ hadoop fs -cat /user/garry/outputs/part-r-00000 >> John Allen 3 124.929998 >> Abigail Smith 3 127.929996 >> April Stevens 1 499.989990 >> Nasser Hafez 1 13.420000 >> >> I know map method put these text file into map,like follows,right? >> <001, 35.99> >> <001, 35.99> >> <002, 12.49> >> <004, 13.42> >> <003, 499.99> >> <001 ,78.95> >> <002, 21.99> >> <002, 93.45> >> <001, 9.99> >> <001, John Allen> >> <002, Abigail Smith> >> <003, April Stevens> >> <004, Nasser Hafez> >> >> But I don't under stand reduce method,how it produce following result,any >> one counld give the detail steps to produce following result? Thanks in >> advance >> John Allen 3 124.929998 >> Abigail Smith 3 127.929996 >> April Stevens 1 499.989990 >> Nasser Hafez 1 13.420000 >> >> >> >> --------------------------------------------------------------------------------------------------- >> Confidentiality Notice: The information contained in this e-mail and any >> accompanying attachment(s) >> is intended only for the use of the intended recipient and may be >> confidential and/or privileged of >> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader >> of this communication is >> not the intended recipient, unauthorized use, forwarding, printing, >> storing, disclosure or copying >> is strictly prohibited, and may be unlawful.If you have received this >> communication in error,please >> immediately notify the sender by return e-mail, and delete the original >> message and all copies from >> your system. Thank you. >> --------------------------------------------------------------------------------------------------- >> >