This is also a project called HAMA <http://wiki.apache.org/hama/>. Perhaps your efforts can help further (or coordinate with) HAMA?
Best regards, Avram On Wednesday, December 09, 2009, at 10:05AM, "Ted Dunning" <[email protected]> wrote: >See here: > >http://arxiv.org/abs/0909.4061v1 > >On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN ><[email protected]>wrote: > >> Hi Jake, >> >> I am implementing the classical multiplicative update rule of NMF. The >> matrix to be factorized is really big and sparse. Are you suggesting that I >> can use some specialised algorithms for sparse matrix instead of the >> standard multiplication algorithm? But what algorithms are you referring to? >> Could you please provide some pointers? >> >> Thanks >> >> >> >> >> ________________________________ >> From: Jake Mannix <[email protected]> >> To: [email protected] >> Sent: Tue, December 8, 2009 8:53:39 PM >> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication >> >> Hi Mike, >> >> In the NMF work you're doing, the matrices you're multiplying are very >> thin, right? They're a handful of dense vectors on the left, an equal >> number of dense vectors on the right, and you're only computing the values >> in the outer product which coincide with the nonzero entries of the >> (sparse) >> input matrix? In which case you're not really doing the full matrix >> multiplication, and doing this a slightly specialized way should provide >> some serious speedup. >> >> Unless you're not dealing with the "really big, but sparse" case, and >> minimizing error only over the nonzero inputs? >> >> -jake >> >> On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN >> <[email protected]>wrote: >> >> > Hi Ted, John, >> > >> > I am also interested in matrix multiplication. Actually, I am >> implementing >> > Non-negative Matrix Factorization, which basically is iterations of >> matrix >> > multiplication. What I did is exactly as what Ted said: doing an outer >> > product using one column from left matrix and one row from right matrix >> and >> > summing up all the results. Idealy, this could be done using only one >> job: >> > mappers doing multiplication and reducers doing summing. But the thing is >> > that how do you match a column from left matrix and a row from right >> matrix. >> > Assuming that two input matrices are give in column major order and row >> > major order, it seems to me that you have to implement a customized >> > InputFormat to do the matching. Or maybe you could use one matrix as >> input >> > and read the other inside mappers, but this also has problem when you >> have a >> > lot of mappers. The straightforward way seems to be using two jobs. The >> > mappers of the first job emit the id and the content of the row or column >> it >> > reads and the reducers do the multiplication. The second job is to do >> the >> > summing. This is kind of similar to what John did. But it is inefficient >> if >> > you have many iterations. Any comments? >> > >> > >> > >> > >> > ________________________________ >> > From: Ted Dunning <[email protected]> >> > To: [email protected]; mahout-user < >> [email protected] >> > > >> > Sent: Tue, December 8, 2009 2:59:06 PM >> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication >> > >> > John, >> > >> > Your comment about matrix multiplication was forwarded to the mahout-user >> > emailing list. >> > >> > I have a question for you about your approach. Have you considered doing >> > the multiplication in a single step by storing the the first matrix in >> > column major order and the second in row major order? The idea would be >> to >> > do a block-wise outer product and use the combiner to sum elements ahead >> of >> > the reducer. This could be done by reading a (block) column from the >> left >> > matrix and a (block) row from the second and emitting all of the (block) >> > elements of the outer product of this column and row. The key emitted >> with >> > these blocks would be the i,j index of the final location of each block >> in >> > the final product. The combiner and reducer could add all of the blocks >> > together and the use of the combiner would serve to substantially >> minimize >> > the amount of network traffic. >> > >> > This alternative is equivalent to inverting the loop nesting in a >> > conventional matrix multiplication algorithm. It is also very closely >> > related to the way that cooccurrence counting is typically done in >> Hadoop. >> > >> > Is your project an ongoing effort? >> > >> > ---------- Forwarded message ---------- >> > From: Robin Anil <[email protected]> >> > Date: Tue, Dec 8, 2009 at 11:09 AM >> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication >> > To: [email protected] >> > Cc: John Norstad <[email protected]> >> > >> > >> > ---------- Forwarded message ---------- >> > From: John Norstad <[email protected]> >> > Date: Tue, Dec 8, 2009 at 10:14 PM >> > Subject: A MapReduce Algorithm for Matrix Multiplication >> > To: MapReduce <[email protected]> >> > >> > >> > As an exercise while learning MapReduce, I developed an algorithm for >> > matrix multiplication and wrote it up on my web site. If you're >> > interested, it's at: >> > >> > http://homepage.mac.com/j.norstad/matrix-multiply >> > >> > I present the algorithm in English, as pseudo-code, and as Java source >> > code for Hadoop. >> > >> > -- >> > John Norstad >> > Academic and Research Technologies >> > Northwestern University >> > >> > >> > >> > -- >> > Ted Dunning, CTO >> > DeepDyve >> > >> > >> > >> > >> > >> >> >> >> > > > > >-- >Ted Dunning, CTO >DeepDyve > >
