On Apr 8, 2007, at 1:48 AM, Tom White wrote:

I think we can do a lot to improve the use of generics, particularly
in MapReduce.
<... use generics in interfaces ...>

I like it. I was thrown off at first because classes aren't specialized based on their template parameters, but specialization of the parent class is available.

Reducer would be changed similarly, although I'm not sure how we could
constrain the output types of the Mapper to be the input types of the
Reducer. Perhaps via the JobConf?

That is easy, actually. In the JobClient, we'd just check to see if the types all play well together. Basically, you need:
K1,V1 -> map -> K2, V2
K2, V2 -> combiner -> K2, V2 (if used)
K2, V2 -> reduce -> K3, V3

It will be a tricky bit of specification to decide exactly what the right semantics are, since even with the generics, the application isn't required to define them. Therefore, we have 5 places where we could find a value for K2 (config, mapper output, combiner input, combiner output, or reduce input). Clearly all classes must be checked for consistency once Hadoop decides what the right values are for each type.

The other piece that this interacts with is the desire to use context objects in the parameter list. However, they appear to be orthogonal to each other.

-- Owen

Reply via email to