Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted
unnecessary keys
-------------------------------------------------------------------------------------------
Key: HADOOP-2433
URL: https://issues.apache.org/jira/browse/HADOOP-2433
Project: Hadoop
Issue Type: Bug
Components: contrib/streaming
Reporter: arkady borkovsky
When streaming command specifies
-mapper org.apache.hadoop.mapred.lib.IdentityMapper
the reducer should receive exactly the same text lines as where present in the
input.
The only modification is the reordering the input.
Currently, org.apache.hadoop.mapred.lib.IdentityMapper inserts ofsets in the
input as keys. Which renders it useless.
Moreover, in the latest release org.apache.hadoop.mapred.lib.IdentityMapper
just crashes:
>java.io.IOException: Type mismatch in key from map: e
xpected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:331)
at
org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
(I open only one bug, as it is broken anyway, the new behavior does not
actually make it any worse than before)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.