I hacked around this in MRUnit last night. MRUnit now has support for the new API -- See MAPREDUCE-800.
You can, in fact, subclass Mapper.Context and Reducer.Context, since they don't actually share any state with the outer class Mapper/Reducer implementation, just the type signatures. But doing this requires making a dummy "Mapper" subclass to wrap around your own Context subclass, which is somewhat unsightly. C'est la vie. - Aaron On Wed, Jul 22, 2009 at 10:44 PM, David Hall<d...@cs.stanford.edu> wrote: > For what it's worth, we ended up solving this problem (today) by using > EasyMock with ClassExtension. It's an awful lot of magic, but it seems > to work just fine for our purposes. It would be great if doing > bytecode weaving under the hood weren't necessary just to write test > code, though. > > -- David > > On Wed, Jul 22, 2009 at 8:21 PM, Aaron Kimball<aa...@cloudera.com> wrote: >> er, +CC mapreduce-dev... >> - A >> >> On Wed, Jul 22, 2009 at 8:17 PM, Aaron Kimball<aa...@cloudera.com> wrote: >>> +CC mapred-dev >>> >>> Hm.. Making this change is actually really difficult. >>> >>> After changing Mapper.java, I understand why this was made a >>> non-static member. By making Context non-static, it can inherit from >>> MapContext<W, X, Y, Z> and bind to the type qualifiers already >>> specified in the class definition. So you can't make Context static >>> because it needs to inherit the type parameters from the Mapper class. >>> (Since Mapper has type qualifiers, it actually can't have static inner >>> classes anyway.) But why even have Context in there anyway, given that >>> it does nothing beyond MapContext's implementation? >>> >>> We can still change the map method's signature to use MapContext. >>> >>> As it is, you have to write: >>> class MyMapper extends Mapper<W, X, Y, Z> { >>> void map(W k, X v, Context c) { >>> .... >>> } >>> } >>> >>> To make the change, you would now need to write: >>> >>> class MyMapper extends Mapper<W, X, Y, Z> { >>> void map(W k, X v, MapContext<W,X,Y,Z> c) { >>> .... >>> } >>> } >>> >>> So I think the primary reason for the non-static inner member is to >>> save you some typing. That having been said, it makes adding mock >>> implementations of Context really difficult. >>> >>> The real reason this is a tricky change, though, is that this is >>> (surprisingly, to me) an incompatible change -- even though Context >>> subclasses MapContext, so it's a type-safe API change to widen the set >>> of inputs we accept in map(), anyone who specified @Override on the >>> method and uses Context as an argument will get a compiler error :\ >>> (@Override, it seems, uses the literal names even if there's a >>> type-safe promotion to a method in a superclass.) >>> >>> Can we still make incompatible API changes on the new API, or is it >>> officially frozen? If incompatible changes are allowed, I'd like to >>> see this in. I think that in the interest of better >>> mockability/extensibility, it'd be cleaner to ditch the inner Context >>> class in favor of explicit use of MapContext. We know Java's type >>> system is verbose, but that doesn't mean we should try to hack around >>> it, if that means losing functionality. (I think I know how to get >>> around this in MRUnit, but it'd be cleaner to not have to.) >>> >>> - Aaron >>> >>> >>> >>> On Wed, Jul 22, 2009 at 7:22 PM, Aaron Kimball<aa...@cloudera.com> wrote: >>>> Both of those are good points. I'll submit a patch. >>>> - Aaron >>>> >>>> On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning<ted.dunn...@gmail.com> wrote: >>>>> To amplify David's point, why is the argument a Mapper.Context rather than >>>>> MapContext? >>>>> >>>>> Also, why is the Mapper.Context not static? >>>>> >>>>> On Wed, Jul 22, 2009 at 5:29 PM, David Hall <d...@cs.stanford.edu> wrote: >>>>> >>>>>> This is nice, but doesn't it suffer from the same problem? MRUnit uses >>>>>> the mapred API, which is deprecated, and the new API doesn't use >>>>>> OutputCollector, but a non-static inner class. >>>>>> >>>>>> -- David >>>>>> >>>>>> On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball<aa...@cloudera.com> wrote: >>>>>> > Hi David, >>>>>> > >>>>>> > I wrote a contrib module called MRUnit >>>>>> > (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow >>>>>> > unit tests for mappers/reducers more easily. It's slated for inclusion >>>>>> > in 0.21, not 0.20 unfortunately, but you can download the patch above >>>>>> > as well as MAPREDUCE-680 and build it against any earlier version of >>>>>> > Hadoop. Unfortunately, it doesn't currently support the new APIs >>>>>> > (e.g., with Context objects), but I imagine this could be added with >>>>>> > little difficulty. I just haven't had time to do it myself ;) If you'd >>>>>> > like to take a stab at it, I'd love some help! >>>>>> > >>>>>> > More info is at www.cloudera.com/hadoop-mrunit >>>>>> > Cheers, >>>>>> > - Aaron >>>>>> > >>>>>> > On Wed, Jul 22, 2009 at 2:49 PM, David Hall<d...@cs.stanford.edu> >>>>>> > wrote: >>>>>> >> Hi, >>>>>> >> >>>>>> >> I'm a student working with Apache Mahout for the Google Summer of >>>>>> >> Code. We recently moved to 0.20.0, and I was porting my code to the >>>>>> >> new API. Unfortunately, I (and the whole project team) seem to have >>>>>> >> run into a problem when it comes to testing them. >>>>>> >> >>>>>> >> Historically, we would create a Mapper in a unit test, and a special >>>>>> >> "DummyOutputCollector", which was essentially a multimap dressed up to >>>>>> >> conform to OutputCollector. In Hadoop 0.20.0, this isn't possible >>>>>> >> anymore, because Mappers take an instance of an inner class. >>>>>> >> >>>>>> >> It's of course possible to dress up the Context in something else >>>>>> >> (say, something just like an OutputCollector), and to specify that >>>>>> >> Mahout Mappers should just delegate to a method that takes an >>>>>> >> OutputCollector. But, this seems to not be very idiomatic. >>>>>> >> >>>>>> >> All this goes to say, what would be a "best practice" for testing >>>>>> >> Mappers and Reducers in 0.20.0? >>>>>> >> >>>>>> >> Thanks, >>>>>> >> David Hall >>>>>> >> >>>>>> > >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Ted Dunning, CTO >>>>> DeepDyve >>>>> >>>> >>> >> >