[
https://issues.apache.org/jira/browse/HADOOP-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552032
]
Owen O'Malley commented on HADOOP-2433:
---------------------------------------
The problem of course is that TextInputFormat returns the offset as the key and
the line as the value.
How about creating a new InputFormat (maybe named LineInputFormat?) that
returns a Text, and a NullWritable? With the change, I'm putting into
HADOOP-2425, with the IdentityMapper and IdentityReducer would precisely sort
the input files. I would even propose that we make LineInputFormat the default
eventually, after deprecating TextInputFormat.
> Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted
> unnecessary keys
> -------------------------------------------------------------------------------------------
>
> Key: HADOOP-2433
> URL: https://issues.apache.org/jira/browse/HADOOP-2433
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: arkady borkovsky
>
> When streaming command specifies
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper
> the reducer should receive exactly the same text lines as where present in
> the input.
> The only modification is the reordering the input.
> Currently, org.apache.hadoop.mapred.lib.IdentityMapper inserts ofsets in the
> input as keys. Which renders it useless.
> Moreover, in the latest release org.apache.hadoop.mapred.lib.IdentityMapper
> just crashes:
> >java.io.IOException: Type mismatch in key from map: e
> xpected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:331)
> at
> org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> (I open only one bug, as it is broken anyway, the new behavior does not
> actually make it any worse than before)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.