[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
[ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427993#comment-13427993 ] Harsh J commented on MAPREDUCE-4507: The {{map()}} function is to be properly overriden when using the new API. Using @Override annotations on map() (and for that matter, reduce() too) will help you catch your mistake here. As discussed on http://search-hadoop.com/m/hSxqz1vsQPc, this is a user-side mistake, but in no way a bug. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html#map(KEYIN,%20VALUEIN,%20org.apache.hadoop.mapreduce.Mapper.Context). We can add a javadoc improvement (and a tutorial improvement) to state the right answer to avoiding this issue: Always use @Override annotations when overriding methods. (Any IDE today provides support for this). IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict --- Key: MAPREDUCE-4507 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Environment: linux ubuntu Reporter: Bejoy KS If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
[ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427695#comment-13427695 ] Bejoy KS commented on MAPREDUCE-4507: - This piece of code will trigger IdentityReducer. No compile time errors thrown even though the Input Key Type is not matching at Class and Method levels Main Class {code} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WcNewMain extends Configured implements Tool { public int run(String[] args) throws Exception { //getting configuration object and setting job name Configuration conf = getConf(); Job job = new Job(conf, Word Count ); //setting the class names job.setJarByClass(WcNewMain.class); job.setMapperClass(WcMapperNew.class); //job.setReducerClass(WordCountReducer.class); job.setNumReduceTasks(0); //setting the output data type classes job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(hdfs://localhost:9000/userdata/bejoy/samples/wc/input)); FileOutputFormat.setOutputPath(job, new Path(hdfs://localhost:9000/userdata/bejoy/samples/wc/output)); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WcNewMain(), args); System.exit(res); } } {code} Mapper Class {code} import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WcMapperNew extends MapperIntWritable, Text, Text, IntWritable { //hadoop supported data types private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //taking one line at a time and tokenizing the same String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); //iterating through all the words available in that line and forming the key value pair while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); //sending to output collector which inturn passes the same to reducer context.write(word, one); } } } {code} IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict --- Key: MAPREDUCE-4507 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Environment: linux ubuntu Reporter: Bejoy KS If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
[ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427701#comment-13427701 ] Bejoy KS commented on MAPREDUCE-4507: - Sorry About the typo This piece of code will trigger ***{color:blue} IdentityMapper{color} IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict --- Key: MAPREDUCE-4507 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Environment: linux ubuntu Reporter: Bejoy KS If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
[ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427709#comment-13427709 ] Bejoy KS commented on MAPREDUCE-4507: - By just changing the Input Key Type to IntWritable in map method level. The code throws {code} java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable {code} Modified Mapper Class {code} import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WcMapperNew extends MapperIntWritable, Text, Text, IntWritable { //hadoop supported data types private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(IntWritable key, Text value, Context context) throws IOException, InterruptedException { //taking one line at a time and tokenizing the same String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); //iterating through all the words available in that line and forming the key value pair while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); //sending to output collector which inturn passes the same to reducer context.write(word, one); } } } {code} IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict --- Key: MAPREDUCE-4507 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Environment: linux ubuntu Reporter: Bejoy KS If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira