Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted 
unnecessary keys
-------------------------------------------------------------------------------------------

                 Key: HADOOP-2433
                 URL: https://issues.apache.org/jira/browse/HADOOP-2433
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/streaming
            Reporter: arkady borkovsky


When streaming command specifies 
-mapper org.apache.hadoop.mapred.lib.IdentityMapper
the reducer should receive exactly the same text lines as where present in the 
input.
The only modification is the reordering the input.
Currently, org.apache.hadoop.mapred.lib.IdentityMapper inserts ofsets in the 
input as keys.  Which renders it useless.

Moreover, in the latest release org.apache.hadoop.mapred.lib.IdentityMapper 
just crashes:
>java.io.IOException: Type mismatch in key from map: e
xpected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:331)
        at 
org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

(I open only one bug, as it is broken anyway, the new behavior does not 
actually make it any worse than before)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to