I spent a lot of time trying to get the mapside-joins in the 
DistributedRowMatrix.multiply() to work without the .mapred.join package, and 
it simply can't be done without some major hacks, or when the Hadoop folks 
decide to include the mapreduce.lib.join.* package in a stable release. 

iPhone'd

On Jan 14, 2012, at 12:03, Sean Owen <[email protected]> wrote:

> True that but I think most of the use of .mapred. is not of this form.
> It's still using the old Mappers and Reducers and InputFormats and
> such. Maybe it's all actually somehow necessary to still use
> ChainReducer or MultipleInputs though my impression was that most of
> it was not.
> 
> For example right now I see no use of MultipleInputs, ChainMapper or
> ChainReducer. There are some uses of MultipleOutputs, in ssvd. But for
> example I do not see anything that keeps the Bayes code from using
> .mapreduce., and I think this is most of what I'm referring to. Is
> anyone working on this anymore?
> 
> On Sat, Jan 14, 2012 at 4:52 PM, Jake Mannix <[email protected]> wrote:
>> Re: o.a.h.mapred package dependency: haven't we been over this a thousand
>> times?
>> 
>> If we are not *forcing* our users to upgrade Hadoop past 0.20.2-ish, and we
>> want to have nice things like mapside joins, ChainMapper/ChainReducer, and
>> MultipleOutputs, then we're sometimes stuck in the old-and-faded API of
>> yesteryear (org.apache.hadoop.mapred).  Am I forgetting some trick which
>> allows us to get around this, or some decision we made which makes this not
>> relevant?
>> 
>>  -jake
>> 

Reply via email to