[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Owen O'Malley (JIRA) Fri, 25 Jul 2008 09:23:53 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616935#action_12616935
 ]


Owen O'Malley commented on HADOOP-1230:
---------------------------------------

Doug proposed checking the code in as we work on this patch, because it isn't 
called by the rest of the code and will be far easier to review. So the new api 
is in src/mapred/org/apache/hadoop/mapreduce. Notable changes since the last 
patch:
  * The Mapper and Reducer have a new format that combines the MapRunnable with 
Mapper and introduces a similar style for Reducer. Their templating is now much 
easier to understand and use.
  * The Mapper and Reducer base classes are now the identity functions.
  * I've split the context object into a tree where the lower ones inherit from 
the one above.
    * JobContext - information about the job
    * TaskAttemptContxt - information about the task
    * TaskInputOutputContext - add input and output methods for the task
    * MapperContext and ReducerContext provide the specific methods for each
  * I added Job, which is how the user sets up, submits, waits for jobs, and 
gets status. Job also allows kiling the job or tasks.
  * I split the lib directory into parts for in, map, reduce, parition, out to 
give a little hierarchy.
  * I filled in {Text,SequenceFile}{In,Out}putFormat to make sure that I had 
the interfaces right.
  * I changed the input methods to match the serialization factory interfaces.
  * JobConf goes away to replaced by Configuration. The getter methods in 
JobConf mostly go to JobContext. The setter methods mostly go to Job.
  * A word count example is included. That would clearly be moved to the 
example source tree when we are doing the final commit.
  * I removed the number of mappers and replaced it with a max split size. The 
old model was very confusing to explain.
  * I used all new attribute names so that we don't have collisions with the 
old attributes.
  * In the Mapper, the Mapper owns the input key and value, which made the 
multi-threaded mapper easier to do. I need a similar scheme in the 
ReduceContext.getValues.

Missing:
  * I need an interface to query jobs, that were submitted by another process. 
Probably a JobTracker class is the best bet that provides query options and 
returns Jobs.
  * I didn't move TaskCompletionEvents yet.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, 
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, 
> context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain 
> backwards compatibility, I'd suggest that we move over to a new package name 
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. 
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, 
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, 
> Value), progress(), etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Reply via email to