Re: Supporting legacy Mapper and Reducer classes in Crunch

Josh Wills Tue, 25 Sep 2012 10:53:40 -0700

On Tue, Sep 25, 2012 at 10:47 AM, Matthias Friedrich <[email protected]> wrote:
> On Monday, 2012-09-24, Josh Wills wrote:
> [...]
>> This turns out to be kind of tricky to do no matter how we approach
>> the problem, because for this to work, we'll need to (at a minimum)
>> subclass the Mapper.Context and Reducer.Context classes that are
>> passed to the Mapper and Reducer instances, and they have different
>> implementations (most importantly for our purposes, different
>> constructors) under Hadoop 1 and 2.
>>
>> It feels to me that what I need to do is create a separate subproject
>> that has to do some crazy stuff (e.g., use different source
>> directories depending on the value of the crunch.platform variable) in
>> order to be able to create the appropriate kind of subclass of
>> Mapper.Context or Reducer.Context. But this sort of thing seems like
>> such a bad idea that there must be some sort of less-bad option
>> available to me, and I wanted to solicit input before I start tilting
>> at this particular windmill.
>
> I haven't looked at the problem in detail, but perhaps you can get
> away with a bit of reflection. Get the class object of Mapper.Context,
> get its Constructors to figure out if you're running Hadoop 1 or 2,
> and then use Constructor.newInstance() with the right argument list
> depending on the version of Hadoop.


Yeah, that would be my ideal, and is actually how we handle the
TaskAttemptContext class (which is an interface in hadoop 2 but a
class in Hadoop 1) when we need it in Crunch now.

But this is trickier-- I need to actually subclass Map.Context and
Reduce.Context for this to work. It's straightforward-- we delegate ~
everything to the underlying Map.Context or Reduce.Context object
w/the exception of the write() method, which gets handled by the
Emitter object. The only catch is that the subclasses need to call the
parent constructor in their own constructors, and the signatures are
different in hadoop 1 and 2.

>
> This doesn't feel a lot better than the Maven-based solution, but at
> least you can encapsulate it deep in a class somewhere ;-)
>
> Regards,
>   Matthias



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

Re: Supporting legacy Mapper and Reducer classes in Crunch

Reply via email to