streaming should default to KeyValueTextInputFormat with IdentityMapper
-----------------------------------------------------------------------
Key: HADOOP-3306
URL: https://issues.apache.org/jira/browse/HADOOP-3306
Project: Hadoop Core
Issue Type: Improvement
Components: contrib/streaming
Affects Versions: 0.15.3
Reporter: Joydeep Sen Sarma
Priority: Minor
in 15.3 - streaming defaults to TextInputFormat (without -inputformat option).
this is great in case the PipeMapper is used. but in many cases people want to
do an IdentityMapper - and it fails with the IdentityMapper:
a) the map output key type becomes LongWritable (but hadoop has already
defaulted to expect Text)
b) the map output key is the Line number - and intuitively - this is not what
the user expects (almost no one wants to use the line number as the map key).
if we could simply default to KeyValueTextInputFormat with IdentityMapper -
that would resolve both of these problems. This would change default behavior
though - so a little leery ..
using '-mapper cat' is the common workaround - but it just seems like a
needless waste of resources ..
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.