Hi David, I wrote a contrib module called MRUnit (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow unit tests for mappers/reducers more easily. It's slated for inclusion in 0.21, not 0.20 unfortunately, but you can download the patch above as well as MAPREDUCE-680 and build it against any earlier version of Hadoop. Unfortunately, it doesn't currently support the new APIs (e.g., with Context objects), but I imagine this could be added with little difficulty. I just haven't had time to do it myself ;) If you'd like to take a stab at it, I'd love some help!
More info is at www.cloudera.com/hadoop-mrunit Cheers, - Aaron On Wed, Jul 22, 2009 at 2:49 PM, David Hall<d...@cs.stanford.edu> wrote: > Hi, > > I'm a student working with Apache Mahout for the Google Summer of > Code. We recently moved to 0.20.0, and I was porting my code to the > new API. Unfortunately, I (and the whole project team) seem to have > run into a problem when it comes to testing them. > > Historically, we would create a Mapper in a unit test, and a special > "DummyOutputCollector", which was essentially a multimap dressed up to > conform to OutputCollector. In Hadoop 0.20.0, this isn't possible > anymore, because Mappers take an instance of an inner class. > > It's of course possible to dress up the Context in something else > (say, something just like an OutputCollector), and to specify that > Mahout Mappers should just delegate to a method that takes an > OutputCollector. But, this seems to not be very idiomatic. > > All this goes to say, what would be a "best practice" for testing > Mappers and Reducers in 0.20.0? > > Thanks, > David Hall >