Tutorial should mention SetMapOutputKeyClass
--------------------------------------------

                 Key: MAPREDUCE-2064
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2064
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 0.21.0
            Reporter: Clarence Gardner
            Priority: Minor


The official tutorial (mapred_tutorial.html) (and all other tutorials I've seen 
on the web) show a program that has the same datatypes for the key/value pairs 
emitted by the mapper and by the reducer, and shows a configuration call to 
Job.setOutput{Key,Value}Class but doesn't say that it refers to both the mapper 
and the reducer. It sounds like it refers to the reducer output. This might be 
mentioned in the "Job Configuration" section. Here is a possible addition, 
after the "The Job is used to specify ..." paragraph.

The job also configures the types of its key/value pairs with 
setOutputKeyClass(type) andsetOutputValueClass(type), which appy to both the 
mapper and reducer classes. If the types output by the mapper and reducer are 
not the same, that should be followed with setMapOutputKeyClass(type) and 
setMapOutputValueClass(type).

(I'm assuming that at least a call to setOutput{Key,Value}Class is required.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to