question about reduce method

EdwardKing Mon, 17 Feb 2014 01:14:31 -0800

Hello every,
   I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two text 
file,sales.txt and account.txt,like follows:
sales.txt
001 35.99 2012-03-15
002 12.49 2004-07-02
004 13.42 2005-12-20
003 499.99 2010-12-20
001 78.95 2012-04-02
002 21.99 2006-11-30
002 93.45 2008-09-10
001 9.99 2012-05-17


account.txt
001 John Allen Standard 2012-03-15
002 Abigail Smith Premium 2004-07-13
003 April Stevens Standard 2010-12-20
004 Nasser Hafez Premium 2001-04-23

ReduceJoin.java is follows:
import java.io.* ;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;

public class ReduceJoin
{
    
    public static class SalesRecordMapper
    extends Mapper<Object, Text, Text, Text>{
        
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException
        {
            String record = value.toString() ;
            String[] parts = record.split("\t") ;
            
            context.write(new Text(parts[0]), new Text("sales\t"+parts[1])) ;
        }
    }
    
    public static class AccountRecordMapper
    extends Mapper<Object, Text, Text, Text>{
        
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException
        {
            String record = value.toString() ;
            String[] parts = record.split("\t") ;
            
            context.write(new Text(parts[0]), new Text("accounts\t"+parts[1])) ;
        }
    }
    
    public static class ReduceJoinReducer
    extends Reducer<Text, Text, Text, Text>
    {
        
        public void reduce(Text key, Iterable<Text> values,
            Context context
            ) throws IOException, InterruptedException
            {
                String name = "" ;
            double total = 0.0 ;
            int count = 0 ;
            
            for(Text t: values)
            {
                String parts[] = t.toString().split("\t") ;
                
                if (parts[0].equals("sales"))
                {
                    count++ ;
                    total+= Float.parseFloat(parts[1]) ;
                }
                else if (parts[0].equals("accounts"))
                {
                    name = parts[1] ;
                }
            }
            
            String str = String.format("%d\t%f", count, total) ;
            context.write(new Text(name), new Text(str)) ;
        }
    }
    
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Reduce-side join");
        job.setJarByClass(ReduceJoin.class);
        job.setReducerClass(ReduceJoinReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        MultipleInputs.addInputPath(job, new Path(args[0]), 
TextInputFormat.class, SalesRecordMapper.class) ;
        MultipleInputs.addInputPath(job, new Path(args[1]), 
TextInputFormat.class, AccountRecordMapper.class) ;
        //        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        Path outputPath = new Path(args[2]);
        FileOutputFormat.setOutputPath(job, outputPath);
        outputPath.getFileSystem(conf).delete(outputPath);
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

I create join.jar and run it 
$ hadoop jar join.jarReduceJoin sales accounts outputs
$ hadoop fs -cat /user/garry/outputs/part-r-00000
John Allen 3 124.929998
Abigail Smith 3 127.929996
April Stevens 1 499.989990
Nasser Hafez 1 13.420000

I know map method put these text file into map,like follows,right?
<001, 35.99>
<001, 35.99>
<002, 12.49>
<004, 13.42>
<003, 499.99>
<001 ,78.95> 
<002, 21.99> 
<002, 93.45> 
<001, 9.99> 
<001, John Allen>
<002, Abigail Smith>
<003, April Stevens>
<004, Nasser Hafez>

But I don't under stand reduce method,how it produce following result,any one 
counld give the detail steps to produce following result?  Thanks in advance
John Allen 3 124.929998
Abigail Smith 3 127.929996
April Stevens 1 499.989990
Nasser Hafez 1 13.420000



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

question about reduce method

Reply via email to