[ 
https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated HADOOP-920:
--------------------------------

    Status: Open  (was: Patch Available)

OutputFormats are only used when reducing, to generate the final output.  
They're not used when creating intermediate output.  So the bug here is that 
MapFileOutputFormat calls job.getMapOutput{Key,Value}Class()--those methods 
should only be called by the MapReduce kernel when generating intermediate 
output and should not be called by an OutputFormat implementation.  This bug 
was introduced by HADOOP-115.

http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/MapFileOutputFormat.java?p2=%2Flucene%2Fhadoop%2Ftrunk%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fmapred%2FMapFileOutputFormat.java&p1=%2Flucene%2Fhadoop%2Ftrunk%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fmapred%2FMapFileOutputFormat.java&r1=407355&r2=407354&view=diff&pathrev=407355

The proper fix I think is to undo that change to this file.

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value 
> classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks 
> and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() 
> / getMapOutputValueClass() should be used, and when executing reduce tasks 
> classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() 
> should be used.
> Currently both map and reduce tasks will use 
> getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, 
> or they will always use getOutputKeyClassgetOutputValueClass when using 
> SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer 
> implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to