Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted unnecessary keys -------------------------------------------------------------------------------------------
Key: HADOOP-2433 URL: https://issues.apache.org/jira/browse/HADOOP-2433 Project: Hadoop Issue Type: Bug Components: contrib/streaming Reporter: arkady borkovsky When streaming command specifies -mapper org.apache.hadoop.mapred.lib.IdentityMapper the reducer should receive exactly the same text lines as where present in the input. The only modification is the reordering the input. Currently, org.apache.hadoop.mapred.lib.IdentityMapper inserts ofsets in the input as keys. Which renders it useless. Moreover, in the latest release org.apache.hadoop.mapred.lib.IdentityMapper just crashes: >java.io.IOException: Type mismatch in key from map: e xpected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:331) at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) (I open only one bug, as it is broken anyway, the new behavior does not actually make it any worse than before) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.