Moving to [email protected].
If you have a question about this, please reply to
user mailing list instead of mapreduce-dev@.
Thanks,
Akira
(2014/02/17 10:06), Akira AJISAKA wrote:
>> I know map method put these text file into map,like follows,right?
>> <001, 35.99>
>> <001, 35.99>
>> <002, 12.49>
>> <004, 13.42>
>> <003, 499.99>
>> <001 ,78.95>
>> <002, 21.99>
>> <002, 93.45>
>> <001, 9.99>
>> <001, John Allen>
>> <002, Abigail Smith>
>> <003, April Stevens>
>> <004, Nasser Hafez>
>
> Followings outputs are the correct.
>
> <001,sales 35.99>
> <002,sales 12.49>
> <004,sales 13.42>
> <003,sales 499.99>
> <001,sales 78.95>
> <002,sales 21.99>
> <002,sales 93.45>
> <001,sales 9.99>
> <001,accounts John Allen>
> <002,accounts Abigail Smith>
> <003,accounts April Stevens>
> <004,accounts Nasser Hafez>
>
> The outputs are grouped and sorted by keys, and reducers process each
> groups. The inputs of the reduce method are as follows:
>
> <key: 001,
> values: {sales 35.99, sales 78.95, sales 9.99, accounts John Allen}>
> <key: 002,
> values: {sales 12.49, sales 21.99, sales 93.45, accounts Abigail Smith}>
> <key: 003,
> values: {sales 499.99, accounts April Stevens}>
> <key: 004,
> values: {sales 13.42, accounts Nasser Hafez}>
>
> Regards,
> Akira
>
> (2014/02/17 1:14), EdwardKing wrote:
>> Hello every,
>> I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two
>> text file,sales.txt and account.txt,like follows:
>> sales.txt
>> 001 35.99 2012-03-15
>> 002 12.49 2004-07-02
>> 004 13.42 2005-12-20
>> 003 499.99 2010-12-20
>> 001 78.95 2012-04-02
>> 002 21.99 2006-11-30
>> 002 93.45 2008-09-10
>> 001 9.99 2012-05-17
>>
>> account.txt
>> 001 John Allen Standard 2012-03-15
>> 002 Abigail Smith Premium 2004-07-13
>> 003 April Stevens Standard 2010-12-20
>> 004 Nasser Hafez Premium 2001-04-23
>>
>> ReduceJoin.java is follows:
>> import java.io.* ;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Job;
>> import org.apache.hadoop.mapreduce.Mapper;
>> import org.apache.hadoop.mapreduce.Reducer;
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>> import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ;
>> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;
>>
>> public class ReduceJoin
>> {
>>
>> public static class SalesRecordMapper
>> extends Mapper<Object, Text, Text, Text>{
>>
>> public void map(Object key, Text value, Context context
>> ) throws IOException, InterruptedException
>> {
>> String record = value.toString() ;
>> String[] parts = record.split("\t") ;
>>
>> context.write(new Text(parts[0]), new
>> Text("sales\t"+parts[1])) ;
>> }
>> }
>>
>> public static class AccountRecordMapper
>> extends Mapper<Object, Text, Text, Text>{
>>
>> public void map(Object key, Text value, Context context
>> ) throws IOException, InterruptedException
>> {
>> String record = value.toString() ;
>> String[] parts = record.split("\t") ;
>>
>> context.write(new Text(parts[0]), new
>> Text("accounts\t"+parts[1])) ;
>> }
>> }
>>
>> public static class ReduceJoinReducer
>> extends Reducer<Text, Text, Text, Text>
>> {
>>
>> public void reduce(Text key, Iterable<Text> values,
>> Context context
>> ) throws IOException, InterruptedException
>> {
>> String name = "" ;
>> double total = 0.0 ;
>> int count = 0 ;
>>
>> for(Text t: values)
>> {
>> String parts[] = t.toString().split("\t") ;
>>
>> if (parts[0].equals("sales"))
>> {
>> count++ ;
>> total+= Float.parseFloat(parts[1]) ;
>> }
>> else if (parts[0].equals("accounts"))
>> {
>> name = parts[1] ;
>> }
>> }
>>
>> String str = String.format("%d\t%f", count, total) ;
>> context.write(new Text(name), new Text(str)) ;
>> }
>> }
>>
>> public static void main(String[] args) throws Exception {
>> Configuration conf = new Configuration();
>> Job job = new Job(conf, "Reduce-side join");
>> job.setJarByClass(ReduceJoin.class);
>> job.setReducerClass(ReduceJoinReducer.class);
>> job.setOutputKeyClass(Text.class);
>> job.setOutputValueClass(Text.class);
>> MultipleInputs.addInputPath(job, new Path(args[0]),
>> TextInputFormat.class, SalesRecordMapper.class) ;
>> MultipleInputs.addInputPath(job, new Path(args[1]),
>> TextInputFormat.class, AccountRecordMapper.class) ;
>> // FileOutputFormat.setOutputPath(job, new Path(args[2]));
>> Path outputPath = new Path(args[2]);
>> FileOutputFormat.setOutputPath(job, outputPath);
>> outputPath.getFileSystem(conf).delete(outputPath);
>>
>> System.exit(job.waitForCompletion(true) ? 0 : 1);
>> }
>> }
>>
>> I create join.jar and run it
>> $ hadoop jar join.jarReduceJoin sales accounts outputs
>> $ hadoop fs -cat /user/garry/outputs/part-r-00000
>> John Allen 3 124.929998
>> Abigail Smith 3 127.929996
>> April Stevens 1 499.989990
>> Nasser Hafez 1 13.420000
>>
>> I know map method put these text file into map,like follows,right?
>> <001, 35.99>
>> <001, 35.99>
>> <002, 12.49>
>> <004, 13.42>
>> <003, 499.99>
>> <001 ,78.95>
>> <002, 21.99>
>> <002, 93.45>
>> <001, 9.99>
>> <001, John Allen>
>> <002, Abigail Smith>
>> <003, April Stevens>
>> <004, Nasser Hafez>
>>
>> But I don't under stand reduce method,how it produce following result,any
>> one counld give the detail steps to produce following result? Thanks in
>> advance
>> John Allen 3 124.929998
>> Abigail Smith 3 127.929996
>> April Stevens 1 499.989990
>> Nasser Hafez 1 13.420000
>>
>>
>>
>> ---------------------------------------------------------------------------------------------------
>> Confidentiality Notice: The information contained in this e-mail and any
>> accompanying attachment(s)
>> is intended only for the use of the intended recipient and may be
>> confidential and/or privileged of
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
>> of this communication is
>> not the intended recipient, unauthorized use, forwarding, printing,
>> storing, disclosure or copying
>> is strictly prohibited, and may be unlawful.If you have received this
>> communication in error,please
>> immediately notify the sender by return e-mail, and delete the original
>> message and all copies from
>> your system. Thank you.
>> ---------------------------------------------------------------------------------------------------
>>
>