Look for uses of the DistributedRowMatrix in the Mahout code. The existing Mahout jobs are generally end-to-end algorithm implementations which do things like matrix multiplication in the middle. Also, the Mahout algorithms generally prefer to use sparse data for distributed work.
What is a "large" matrix? You may find that you really don't need to go to the effort of using Hadoop. Lance On Sat, Nov 19, 2011 at 3:07 PM, Stephen Boesch <java...@gmail.com> wrote: > Hi, > there are two solutions suggested that take advantage of either (a) a > vector x matrix (your CF / Mahout example ) or (b) a small matrix x large > matrix (an earlier suggestion of putting the small matrix into the > Distributed Cache). Not clear yet on good approaches of (c) large matrix > x large matrix. > > > 2011/11/19 <bejoy.had...@gmail.com> > > > Hey Mike > > In mahout one place where matrix multiplication is used is in > > Collaborative Filtering distributed implementation. The recommendations > > here are generated by the multiplication of a cooccurence matrix with a > > user vector. This user vector is treated as a single column matrix and > then > > the matrix multiplication takes place in there. > > > > Regards > > Bejoy K S > > > > -----Original Message----- > > From: Mike Spreitzer <mspre...@us.ibm.com> > > Date: Fri, 18 Nov 2011 14:52:05 > > To: <common-user@hadoop.apache.org> > > Reply-To: common-user@hadoop.apache.org > > Subject: RE: Matrix multiplication in Hadoop > > > > Well, this mismatch may tell me something interesting about Hadoop. > Matrix > > multiplication has a lot of inherent parallelism, so from very crude > > considerations it is not obvious that there should be a mismatch. Why is > > matrix multiplication ill-suited for Hadoop? > > > > BTW, I looked into the Mahout documentation some, and did not find matrix > > multiplication there. It might be hidden inside one of the advertised > > algorithms; I looked at the documentation for a few, but did not notice > > mention of MM. > > > > Thanks, > > Mike > > > > > > > > From: Michael Segel <michael_se...@hotmail.com> > > To: <common-user@hadoop.apache.org> > > Date: 11/18/2011 01:49 PM > > Subject: RE: Matrix multiplication in Hadoop > > > > > > > > > > Ok Mike, > > > > First I admire that you are studying Hadoop. > > > > To answer your question... not well. > > > > Might I suggest that if you want to learn Hadoop, you try and find a > > problem which can easily be broken in to a series of parallel tasks where > > there is minimal communication requirements between each task? > > > > No offense, but if I could make a parallel... what you're asking is akin > > to taking a normalized relational model and trying to run it as is in > > HBase. > > Yes it can be done. But not the best use of resources. > > > > > To: common-user@hadoop.apache.org > > > CC: common-user@hadoop.apache.org > > > Subject: Re: Matrix multiplication in Hadoop > > > From: mspre...@us.ibm.com > > > Date: Fri, 18 Nov 2011 12:39:00 -0500 > > > > > > That's also an interesting question, but right now I am studying Hadoop > > > and want to know how well dense MM can be done in Hadoop. > > > > > > Thanks, > > > Mike > > > > > > > > > > > > From: Michel Segel <michael_se...@hotmail.com> > > > To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org > > > > > Date: 11/18/2011 12:34 PM > > > Subject: Re: Matrix multiplication in Hadoop > > > > > > > > > > > > Is Hadoop the best tool for doing large matrix math. > > > Sure you can do it, but, aren't there better tools for these types of > > > problems? > > > > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > > > > > -- Lance Norskog goks...@gmail.com