[ 
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617780#action_12617780
 ] 

Tom White commented on HADOOP-1230:
-----------------------------------

Overall this looks great. I think the interface is much more approachable than 
the last one.

1. What is the contract for cleanup()? Is is called if map()/reduce() throws an 
exception? I think it should be, so Mapper/Reducer#run should call cleanup() in 
a finally clause.
2. One of the things that the previous version supported was a flexible way of 
handling large value classes. If your value is huge you may not want to 
deserialize it into an object, but instead read the byte stream directly. This 
isn't  apart of this issue, but I think the current approach will support it by 
i) adding streaming accessors to the context, ii) overriding the run() method 
to pass in a null value, so map()/reduce() implementations get the value byte 
stream from the context. (More generally, this might be the approach to support 
HADOOP-2429.) Does this sound right?
3. ReduceContext could be made to implement Iterable<VALUEIN>, to make it 
slightly more concise to iterate over the values (for expert use in the run 
method). The reduce method would be unchanged.
4. Although not a hard requirement, it would be nice to make the user API 
serialization agnostic. I think we can make InputSplit not implement Writable, 
and use a SerializationFactory to serialize splits. Most implementations would 
be Writable, but they don't have to be. Counter and ID are Writable, but I 
think that's probably OK as they are not meant to be subclassed. (Having said 
that though, exposing them as interfaces in the API would allow us to remove 
the dependency on Writable, which is an implementation detail.)
5. Is this a good opportunity to make TextInputFormat extend 
FileInputFormat<Text, NullWritable>, like HADOOP-3566?
6. JobContext#getGroupingComparator has javadoc that refers to 
WritableComparable, when it should be RawComparable.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, 
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, 
> context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain 
> backwards compatibility, I'd suggest that we move over to a new package name 
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. 
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, 
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, 
> Value), progress(), etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to