[ 
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620071#action_12620071
 ] 

Owen O'Malley commented on HADOOP-1230:
---------------------------------------

Alejandro,
  You seem to be looking at the wrong code. I committed into trunk the current 
version of the proposed API. I still think that adding methods to the mapper is 
*far* more natural than making a wrapping output context. The two approaches 
would look like:

*Option 1*

works for both subclasses that override run and/or the map methods

{code}
class MultipleOutputMapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
  extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  protected <KEY,VALUE> void collect(String output, KEY key, VALUE value) 
throws IOException {...}
}
{code}

*Option 2*

would only work with classes that override the map method

{code}
public abstract class MultipleOutputMapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
  extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {

  protected MultipleOutputContext extends Context {
     protected MultipleOutputContext(Context outerContext) { ... }
     void collect(String output, KEY key, VALUE value) throws IOException {...}
  }

  protected void setup(MultipleOutputContext context
                              ) throws IOException, InterruptedException {
  }

  protected abstract void map(KEYIN key, VALUEIN value, MultipleOutputContext 
context
                                          ) throws IOException, 
InterruptedException;

  protected void cleanup(MultipleOutputContext context
                                  ) throws IOException, InterruptedException {
  }

  public void run(Context outerContext) throws IOException {
     MultipleOutputContext context = new MultipleOutputContext(outerContext);
    setup(context);
    KEYIN key = context.nextKey(null);
    VALUEIN value = null;
    while (key != null) {
      value = context.nextValue(value);
      map(key, value, context);
      key = context.nextKey(key);
    }
    cleanup(context);
  }
}
{code}

Note that these are *NOT* overrides of Mapper.setup, map, and cleanup, but 
instead are overloads of them.

I think that option 1 is cleaner, but either one should work.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, 
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, 
> context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain 
> backwards compatibility, I'd suggest that we move over to a new package name 
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. 
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, 
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, 
> Value), progress(), etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to