[jira] [Created] (HADOOP-9764) MapReduce output issue

Mullangi (JIRA) Tue, 23 Jul 2013 01:33:56 -0700

Mullangi created HADOOP-9764:
--------------------------------

             Summary: MapReduce output issue
                 Key: HADOOP-9764
                 URL: https://issues.apache.org/jira/browse/HADOOP-9764
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 1.0.3
         Environment: ubuntu
            Reporter: Mullangi



Hi,

I am new to Hadoop concepts. 
While practicing with one custom MapReduce program, I found the result is not 
as expected after executing the code on HDFS based file. Please note that when 
I execute the same program using Unix based file,getting expected result.
Below are the details of my code.

MapReduce in java
==================

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.*;

public class WordCount1 {

    public static class Map extends MapReduceBase implements Mapper {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, OutputCollector output, 
Reporter reporter) throws IOException {
        String line = value.toString();
        String tokenedZone=null;
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
          tokenedZone=tokenizer.nextToken();
          word.set(tokenedZone);
          output.collect(word, one);
        }
      }
    }

    public static class Reduce extends MapReduceBase implements Reducer {
      public void reduce(Text key, Iterator values, OutputCollector output, 
Reporter reporter) throws IOException {
        int sum = 0;
        int val = 0;
        while (values.hasNext()) {
                val = values.next().get();
                sum += val;
        }
        if(sum&gt;1)
                output.collect(key, new IntWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf();
      conf.setJarByClass(WordCount1.class);
      conf.setJobName("wordcount1");
      
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);
      
      Path inPath = new Path(args[0]);
      Path outPath = new Path(args[0]);

      FileInputFormat.setInputPaths(conf,inPath );
      FileOutputFormat.setOutputPath(conf, outPath);

      JobClient.runJob(conf);
    }
  
}


input File
===========
test my program
during test and my hadoop 
your during
get program


hadoop generated output file on HDFS file system
=======================================
during  2
my      2
test    2

hadoop generated output file on local file system
=======================================
during  2
my      2
program 2
test    2

Please help me on this issue


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-9764) MapReduce output issue

Reply via email to