Hi David,

I wrote a contrib module called MRUnit
(http://issues.apache.org/jira/browse/hadoop-5518) designed to allow
unit tests for mappers/reducers more easily. It's slated for inclusion
in 0.21, not 0.20 unfortunately, but you can download the patch above
as well as MAPREDUCE-680 and build it against any earlier version of
Hadoop. Unfortunately, it doesn't currently support the new APIs
(e.g., with Context objects), but I imagine this could be added with
little difficulty. I just haven't had time to do it myself ;) If you'd
like to take a stab at it, I'd love some help!

More info is at www.cloudera.com/hadoop-mrunit
Cheers,
- Aaron

On Wed, Jul 22, 2009 at 2:49 PM, David Hall<d...@cs.stanford.edu> wrote:
> Hi,
>
> I'm a student working with Apache Mahout for the Google Summer of
> Code. We recently moved to 0.20.0, and I was porting my code to the
> new API. Unfortunately, I (and the whole project team) seem to have
> run into a problem when it comes to testing them.
>
> Historically, we would create a Mapper in a unit test, and a special
> "DummyOutputCollector", which was essentially a multimap dressed up to
> conform to OutputCollector. In Hadoop 0.20.0, this isn't possible
> anymore, because Mappers take an instance of an inner class.
>
> It's of course possible to dress up the Context in something else
> (say, something just like an OutputCollector), and to specify that
> Mahout Mappers should just delegate to a method that takes an
> OutputCollector. But, this seems to not be very idiomatic.
>
> All this goes to say, what would be a "best practice" for testing
> Mappers and Reducers in 0.20.0?
>
> Thanks,
> David Hall
>

Reply via email to