+CC mapred-dev

Hm.. Making this change is actually really difficult.

After changing Mapper.java, I understand why this was made a
non-static member.  By making Context non-static, it can inherit from
MapContext<W, X, Y, Z> and bind to the type qualifiers already
specified in the class definition. So you can't make Context static
because it needs to inherit the type parameters from the Mapper class.
(Since Mapper has type qualifiers, it actually can't have static inner
classes anyway.) But why even have Context in there anyway, given that
it does nothing beyond MapContext's implementation?

We can still change the map method's signature to use MapContext.

As it is, you have to write:
class MyMapper extends Mapper<W, X, Y, Z> {
  void map(W k, X v, Context c) {
     ....
  }
}

To make the change, you would now need to write:

class MyMapper extends Mapper<W, X, Y, Z> {
  void map(W k, X v, MapContext<W,X,Y,Z> c) {
     ....
  }
}

So I think the primary reason for the non-static inner member is to
save you some typing. That having been said, it makes adding mock
implementations of Context really difficult.

The real reason this is a tricky change, though, is that this is
(surprisingly, to me) an incompatible change -- even though Context
subclasses MapContext, so it's a type-safe API change to widen the set
of inputs we accept in map(), anyone who specified @Override on the
method and uses Context as an argument will get a compiler error :\
(@Override, it seems, uses the literal names even if there's a
type-safe promotion to a method in a superclass.)

Can we still make incompatible API changes on the new API, or is it
officially frozen? If incompatible changes are allowed, I'd like to
see this in. I think that in the interest of better
mockability/extensibility, it'd be cleaner to ditch the inner Context
class in favor of explicit use of MapContext. We know Java's type
system is verbose, but that doesn't mean we should try to hack around
it, if that means losing functionality. (I think I know how to get
around this in MRUnit, but it'd be cleaner to not have to.)

- Aaron



On Wed, Jul 22, 2009 at 7:22 PM, Aaron Kimball<aa...@cloudera.com> wrote:
> Both of those are good points. I'll submit a patch.
> - Aaron
>
> On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning<ted.dunn...@gmail.com> wrote:
>> To amplify David's point, why is the argument a Mapper.Context rather than
>> MapContext?
>>
>> Also, why is the Mapper.Context not static?
>>
>> On Wed, Jul 22, 2009 at 5:29 PM, David Hall <d...@cs.stanford.edu> wrote:
>>
>>> This is nice, but doesn't it suffer from the same problem? MRUnit uses
>>> the mapred API, which is deprecated, and the new API doesn't use
>>> OutputCollector, but a non-static inner class.
>>>
>>> -- David
>>>
>>> On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball<aa...@cloudera.com> wrote:
>>> > Hi David,
>>> >
>>> > I wrote a contrib module called MRUnit
>>> > (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow
>>> > unit tests for mappers/reducers more easily. It's slated for inclusion
>>> > in 0.21, not 0.20 unfortunately, but you can download the patch above
>>> > as well as MAPREDUCE-680 and build it against any earlier version of
>>> > Hadoop. Unfortunately, it doesn't currently support the new APIs
>>> > (e.g., with Context objects), but I imagine this could be added with
>>> > little difficulty. I just haven't had time to do it myself ;) If you'd
>>> > like to take a stab at it, I'd love some help!
>>> >
>>> > More info is at www.cloudera.com/hadoop-mrunit
>>> > Cheers,
>>> > - Aaron
>>> >
>>> > On Wed, Jul 22, 2009 at 2:49 PM, David Hall<d...@cs.stanford.edu> wrote:
>>> >> Hi,
>>> >>
>>> >> I'm a student working with Apache Mahout for the Google Summer of
>>> >> Code. We recently moved to 0.20.0, and I was porting my code to the
>>> >> new API. Unfortunately, I (and the whole project team) seem to have
>>> >> run into a problem when it comes to testing them.
>>> >>
>>> >> Historically, we would create a Mapper in a unit test, and a special
>>> >> "DummyOutputCollector", which was essentially a multimap dressed up to
>>> >> conform to OutputCollector. In Hadoop 0.20.0, this isn't possible
>>> >> anymore, because Mappers take an instance of an inner class.
>>> >>
>>> >> It's of course possible to dress up the Context in something else
>>> >> (say, something just like an OutputCollector), and to specify that
>>> >> Mahout Mappers should just delegate to a method that takes an
>>> >> OutputCollector. But, this seems to not be very idiomatic.
>>> >>
>>> >> All this goes to say, what would be a "best practice" for testing
>>> >> Mappers and Reducers in 0.20.0?
>>> >>
>>> >> Thanks,
>>> >> David Hall
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>

Reply via email to