Re: Is there a way to tell whether you're in a map task or a reduce task?

Owen O'Malley Tue, 10 Feb 2009 19:12:57 -0800


On Feb 10, 2009, at 5:20 PM, Matei Zaharia wrote:

I'd like to write a combiner that shares a lot of code with a reducer,
except that the reducer updates an external database at the end.

The right way to do this is to either do the update in the outputformat or do something like:


class MyCombiner implements Reducer {
 ...
 public void close() throws IOException {}
}

class MyReduer extends MyCombiner {
  ...
  public void close() throws IOException { ... update database ... }
}

As far as I
can tell, since both combiners and reducers must implement the Reducer
interface, there is no way to have this be the same class.


There are ways to do it, but they are likely to change.

Is there a
recommended way to test inside the task whether you're running as acombiner
(in a map task) or as a reducer?

The question is worse than you think. In particular, the question is*not* are you in a map or reduce task. With current versions ofHadoop, the combiner can be called in the context of the reduce aswell as the map. You really want to know if you are in a Reducer orCombiner context.

If not, I think this might be an interesting thing to support in theHadoop
1.0 API.

It probably does make sense to add to ReduceContext.isCombiner() toanswer the question. In practice, usually if someone wants to use*almost* the same code for combiner and reducer, I get suspicious oftheir design.

It would enable people to write an AbstractJob class where you just
implement map, combine and reduce functions, and can thus writeMapReduce
jobs in a single Java class.

The old api allowed this, since both Mapper and Reducer wereinterfaces. The new api doesn't because they are both classes. Itwouldn't be hard to make a set of adaptors in library code that wouldwork. Basically, you would define a job with SimpleMapper,SimpleCombiner, and SimpleReducer that would call Task.map,Task.combine, and Task.reduce.


-- Owen

Re: Is there a way to tell whether you're in a map task or a reduce task?

Reply via email to