[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict

2012-08-03 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427993#comment-13427993
 ] 

Harsh J commented on MAPREDUCE-4507:


The {{map()}} function is to be properly overriden when using the new API. 
Using @Override annotations on map() (and for that matter, reduce() too) will 
help you catch your mistake here.

As discussed on http://search-hadoop.com/m/hSxqz1vsQPc, this is a user-side 
mistake, but in no way a bug. See 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html#map(KEYIN,%20VALUEIN,%20org.apache.hadoop.mapreduce.Mapper.Context).

We can add a javadoc improvement (and a tutorial improvement) to state the 
right answer to avoiding this issue: Always use @Override annotations when 
overriding methods. (Any IDE today provides support for this).

 IdentityMapper is being triggered when the type of the Input Key at class 
 level and method level has a conflict
 ---

 Key: MAPREDUCE-4507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.0.3
 Environment: linux ubuntu
Reporter: Bejoy KS

 If we use the default InputFormat (TextInputFormat) but specify the Key type 
 in mapper as IntWritable instead of Long Writable. The framework is supposed 
 throw a class cast exception.Such an exception is thrown only if the key 
 types at class level and method level are the same (IntWritable). But if we 
 provide the Input key type as IntWritable on the class level but LongWritable 
 on the method level (map method), instead of throwing a compile time error, 
 the code compliles fine . In addition to it on execution the framework 
 triggers Identity Mapper instead of the custom mapper provided with the 
 configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper 
 as Custom Mapper itself , it should show IdentityMapper in cases where 
 IdentityMapper is triggered to avoid confusion and easy debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict

2012-08-02 Thread Bejoy KS (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427695#comment-13427695
 ] 

Bejoy KS commented on MAPREDUCE-4507:
-

This piece of code will trigger IdentityReducer. No compile time errors thrown 
even though the Input Key Type is not matching at Class and Method levels 

Main Class
{code}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class WcNewMain extends Configured implements Tool
{
  public int run(String[] args) throws Exception
  {
//getting configuration object and setting job name
Configuration conf = getConf();
Job job = new Job(conf, Word Count );
  
//setting the class names
job.setJarByClass(WcNewMain.class);
job.setMapperClass(WcMapperNew.class);
//job.setReducerClass(WordCountReducer.class);
job.setNumReduceTasks(0);

//setting the output data type classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

  

FileInputFormat.addInputPath(job, new 
Path(hdfs://localhost:9000/userdata/bejoy/samples/wc/input));
FileOutputFormat.setOutputPath(job, new 
Path(hdfs://localhost:9000/userdata/bejoy/samples/wc/output));

return job.waitForCompletion(true) ? 0 : 1;
}

public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WcNewMain(), args);
System.exit(res);
}
}


{code}

Mapper Class
{code}
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WcMapperNew extends MapperIntWritable, Text, Text, IntWritable
{
//hadoop supported data types
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
   
   public void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException
   {
 //taking one line at a time and tokenizing the same
   String line = value.toString();
   StringTokenizer tokenizer = new StringTokenizer(line);
   
 //iterating through all the words available in that line and 
forming the key value pair
   while (tokenizer.hasMoreTokens())
   {
  word.set(tokenizer.nextToken());
  //sending to output collector which inturn passes the same to 
reducer
  context.write(word, one);
   }
   }
   
 }
{code}

 IdentityMapper is being triggered when the type of the Input Key at class 
 level and method level has a conflict
 ---

 Key: MAPREDUCE-4507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.0.3
 Environment: linux ubuntu
Reporter: Bejoy KS

 If we use the default InputFormat (TextInputFormat) but specify the Key type 
 in mapper as IntWritable instead of Long Writable. The framework is supposed 
 throw a class cast exception.Such an exception is thrown only if the key 
 types at class level and method level are the same (IntWritable). But if we 
 provide the Input key type as IntWritable on the class level but LongWritable 
 on the method level (map method), instead of throwing a compile time error, 
 the code compliles fine . In addition to it on execution the framework 
 triggers Identity Mapper instead of the custom mapper provided with the 
 configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper 
 as Custom Mapper itself , it should show IdentityMapper in cases where 
 IdentityMapper is triggered to avoid confusion and easy debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict

2012-08-02 Thread Bejoy KS (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427701#comment-13427701
 ] 

Bejoy KS commented on MAPREDUCE-4507:
-

Sorry About the typo

This piece of code will trigger ***{color:blue} IdentityMapper{color} 

 IdentityMapper is being triggered when the type of the Input Key at class 
 level and method level has a conflict
 ---

 Key: MAPREDUCE-4507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.0.3
 Environment: linux ubuntu
Reporter: Bejoy KS

 If we use the default InputFormat (TextInputFormat) but specify the Key type 
 in mapper as IntWritable instead of Long Writable. The framework is supposed 
 throw a class cast exception.Such an exception is thrown only if the key 
 types at class level and method level are the same (IntWritable). But if we 
 provide the Input key type as IntWritable on the class level but LongWritable 
 on the method level (map method), instead of throwing a compile time error, 
 the code compliles fine . In addition to it on execution the framework 
 triggers Identity Mapper instead of the custom mapper provided with the 
 configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper 
 as Custom Mapper itself , it should show IdentityMapper in cases where 
 IdentityMapper is triggered to avoid confusion and easy debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict

2012-08-02 Thread Bejoy KS (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427709#comment-13427709
 ] 

Bejoy KS commented on MAPREDUCE-4507:
-

By just changing the Input Key Type to IntWritable in map method level. The 
code throws 

{code}
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast 
to org.apache.hadoop.io.IntWritable

{code}

Modified Mapper Class

{code}
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WcMapperNew extends MapperIntWritable, Text, Text, IntWritable
{
//hadoop supported data types
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
   
   public void map(IntWritable key, Text value, Context context) throws 
IOException, InterruptedException
   {
 //taking one line at a time and tokenizing the same
   String line = value.toString();
   StringTokenizer tokenizer = new StringTokenizer(line);
   
 //iterating through all the words available in that line and 
forming the key value pair
   while (tokenizer.hasMoreTokens())
   {
  word.set(tokenizer.nextToken());
  //sending to output collector which inturn passes the same to 
reducer
  context.write(word, one);
   }
   }
   
 }
{code}

 IdentityMapper is being triggered when the type of the Input Key at class 
 level and method level has a conflict
 ---

 Key: MAPREDUCE-4507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.0.3
 Environment: linux ubuntu
Reporter: Bejoy KS

 If we use the default InputFormat (TextInputFormat) but specify the Key type 
 in mapper as IntWritable instead of Long Writable. The framework is supposed 
 throw a class cast exception.Such an exception is thrown only if the key 
 types at class level and method level are the same (IntWritable). But if we 
 provide the Input key type as IntWritable on the class level but LongWritable 
 on the method level (map method), instead of throwing a compile time error, 
 the code compliles fine . In addition to it on execution the framework 
 triggers Identity Mapper instead of the custom mapper provided with the 
 configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper 
 as Custom Mapper itself , it should show IdentityMapper in cases where 
 IdentityMapper is triggered to avoid confusion and easy debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira