[
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616935#action_12616935
]
Owen O'Malley commented on HADOOP-1230:
---------------------------------------
Doug proposed checking the code in as we work on this patch, because it isn't
called by the rest of the code and will be far easier to review. So the new api
is in src/mapred/org/apache/hadoop/mapreduce. Notable changes since the last
patch:
* The Mapper and Reducer have a new format that combines the MapRunnable with
Mapper and introduces a similar style for Reducer. Their templating is now much
easier to understand and use.
* The Mapper and Reducer base classes are now the identity functions.
* I've split the context object into a tree where the lower ones inherit from
the one above.
* JobContext - information about the job
* TaskAttemptContxt - information about the task
* TaskInputOutputContext - add input and output methods for the task
* MapperContext and ReducerContext provide the specific methods for each
* I added Job, which is how the user sets up, submits, waits for jobs, and
gets status. Job also allows kiling the job or tasks.
* I split the lib directory into parts for in, map, reduce, parition, out to
give a little hierarchy.
* I filled in {Text,SequenceFile}{In,Out}putFormat to make sure that I had
the interfaces right.
* I changed the input methods to match the serialization factory interfaces.
* JobConf goes away to replaced by Configuration. The getter methods in
JobConf mostly go to JobContext. The setter methods mostly go to Job.
* A word count example is included. That would clearly be moved to the
example source tree when we are doing the final commit.
* I removed the number of mappers and replaced it with a max split size. The
old model was very confusing to explain.
* I used all new attribute names so that we don't have collisions with the
old attributes.
* In the Mapper, the Mapper owns the input key and value, which made the
multi-threaded mapper easier to do. I need a similar scheme in the
ReduceContext.getValues.
Missing:
* I need an interface to query jobs, that were submitted by another process.
Probably a JobTracker class is the best bet that provides query options and
returns Jobs.
* I didn't move TaskCompletionEvents yet.
> Replace parameters with context objects in Mapper, Reducer, Partitioner,
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1230
> URL: https://issues.apache.org/jira/browse/HADOOP-1230
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: context-objs-2.patch, context-objs-3.patch,
> context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain
> backwards compatibility, I'd suggest that we move over to a new package name
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package.
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
> void map(WritableComparable key, Writable value, OutputCollector output,
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
> void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key,
> Value), progress(), etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.