streaming should default to KeyValueTextInputFormat with IdentityMapper
-----------------------------------------------------------------------

                 Key: HADOOP-3306
                 URL: https://issues.apache.org/jira/browse/HADOOP-3306
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/streaming
    Affects Versions: 0.15.3
            Reporter: Joydeep Sen Sarma
            Priority: Minor


in 15.3 - streaming defaults to TextInputFormat (without -inputformat option).

this is great in case the PipeMapper is used. but in many cases people want to 
do an IdentityMapper - and it fails with the IdentityMapper:
a) the map output key type becomes LongWritable (but hadoop has already 
defaulted to expect Text)
b) the map output key is the Line number - and intuitively - this is not what 
the user expects (almost no one wants to use the line number as the map key).

if we could simply default to KeyValueTextInputFormat with IdentityMapper - 
that would resolve both of these problems. This would change default behavior 
though - so a little leery ..

using '-mapper cat' is the common workaround - but it just seems like a 
needless waste of resources ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to