I spent a lot of time trying to get the mapside-joins in the DistributedRowMatrix.multiply() to work without the .mapred.join package, and it simply can't be done without some major hacks, or when the Hadoop folks decide to include the mapreduce.lib.join.* package in a stable release.
iPhone'd On Jan 14, 2012, at 12:03, Sean Owen <[email protected]> wrote: > True that but I think most of the use of .mapred. is not of this form. > It's still using the old Mappers and Reducers and InputFormats and > such. Maybe it's all actually somehow necessary to still use > ChainReducer or MultipleInputs though my impression was that most of > it was not. > > For example right now I see no use of MultipleInputs, ChainMapper or > ChainReducer. There are some uses of MultipleOutputs, in ssvd. But for > example I do not see anything that keeps the Bayes code from using > .mapreduce., and I think this is most of what I'm referring to. Is > anyone working on this anymore? > > On Sat, Jan 14, 2012 at 4:52 PM, Jake Mannix <[email protected]> wrote: >> Re: o.a.h.mapred package dependency: haven't we been over this a thousand >> times? >> >> If we are not *forcing* our users to upgrade Hadoop past 0.20.2-ish, and we >> want to have nice things like mapside joins, ChainMapper/ChainReducer, and >> MultipleOutputs, then we're sometimes stuck in the old-and-faded API of >> yesteryear (org.apache.hadoop.mapred). Am I forgetting some trick which >> allows us to get around this, or some decision we made which makes this not >> relevant? >> >> -jake >>
