Re: Matrix multiplication in Hadoop

Lance Norskog Sat, 19 Nov 2011 15:34:29 -0800

Look for uses of the DistributedRowMatrix in the Mahout code. The existing
Mahout jobs are generally end-to-end algorithm implementations which do
things like matrix multiplication in the middle. Also, the Mahout
algorithms generally prefer to use sparse data for distributed work.


What is a "large" matrix? You may find that you really don't need to go to
the effort of using Hadoop.

Lance

On Sat, Nov 19, 2011 at 3:07 PM, Stephen Boesch <java...@gmail.com> wrote:

> Hi,
>   there are two solutions suggested that take advantage of either (a) a
> vector x matrix (your CF / Mahout example )  or (b) a small matrix x large
> matrix (an earlier suggestion of putting the  small matrix into the
> Distributed Cache).  Not clear yet on good approaches of (c)  large matrix
> x large matrix.
>
>
> 2011/11/19 <bejoy.had...@gmail.com>
>
> > Hey Mike
> >          In mahout one place where   matrix multiplication is used is in
> >  Collaborative Filtering distributed implementation. The recommendations
> > here are generated by the multiplication of a cooccurence matrix with a
> > user vector. This user vector is treated as a single column matrix and
> then
> > the matrix multiplication takes place in there.
> >
> > Regards
> > Bejoy K S
> >
> > -----Original Message-----
> > From: Mike Spreitzer <mspre...@us.ibm.com>
> > Date: Fri, 18 Nov 2011 14:52:05
> > To: <common-user@hadoop.apache.org>
> > Reply-To: common-user@hadoop.apache.org
> > Subject: RE: Matrix multiplication in Hadoop
> >
> > Well, this mismatch may tell me something interesting about Hadoop.
> Matrix
> > multiplication has a lot of inherent parallelism, so from very crude
> > considerations it is not obvious that there should be a mismatch.  Why is
> > matrix multiplication ill-suited for Hadoop?
> >
> > BTW, I looked into the Mahout documentation some, and did not find matrix
> > multiplication there.  It might be hidden inside one of the advertised
> > algorithms; I looked at the documentation for a few, but did not notice
> > mention of MM.
> >
> > Thanks,
> > Mike
> >
> >
> >
> > From:   Michael Segel <michael_se...@hotmail.com>
> > To:     <common-user@hadoop.apache.org>
> > Date:   11/18/2011 01:49 PM
> > Subject:        RE: Matrix multiplication in Hadoop
> >
> >
> >
> >
> > Ok Mike,
> >
> > First I admire that you are studying Hadoop.
> >
> > To answer your question... not well.
> >
> > Might I suggest that if you want to learn Hadoop, you try and find a
> > problem which can easily be broken in to a series of parallel tasks where
> > there is minimal communication requirements between each task?
> >
> > No offense, but if I could make a parallel... what you're asking is akin
> > to taking a normalized relational model and trying to run it as is in
> > HBase.
> > Yes it can be done. But not the best use of resources.
> >
> > > To: common-user@hadoop.apache.org
> > > CC: common-user@hadoop.apache.org
> > > Subject: Re: Matrix multiplication in Hadoop
> > > From: mspre...@us.ibm.com
> > > Date: Fri, 18 Nov 2011 12:39:00 -0500
> > >
> > > That's also an interesting question, but right now I am studying Hadoop
> > > and want to know how well dense MM can be done in Hadoop.
> > >
> > > Thanks,
> > > Mike
> > >
> > >
> > >
> > > From:   Michel Segel <michael_se...@hotmail.com>
> > > To:     "common-user@hadoop.apache.org" <common-user@hadoop.apache.org
> >
> > > Date:   11/18/2011 12:34 PM
> > > Subject:        Re: Matrix multiplication in Hadoop
> > >
> > >
> > >
> > > Is Hadoop the best tool for doing large matrix math.
> > > Sure you can do it, but, aren't there better tools for these types of
> > > problems?
> > >
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> >
> >
> >
>



-- 
Lance Norskog
goks...@gmail.com

Re: Matrix multiplication in Hadoop

Reply via email to