John, Your comment about matrix multiplication was forwarded to the mahout-user emailing list.
I have a question for you about your approach. Have you considered doing the multiplication in a single step by storing the the first matrix in column major order and the second in row major order? The idea would be to do a block-wise outer product and use the combiner to sum elements ahead of the reducer. This could be done by reading a (block) column from the left matrix and a (block) row from the second and emitting all of the (block) elements of the outer product of this column and row. The key emitted with these blocks would be the i,j index of the final location of each block in the final product. The combiner and reducer could add all of the blocks together and the use of the combiner would serve to substantially minimize the amount of network traffic. This alternative is equivalent to inverting the loop nesting in a conventional matrix multiplication algorithm. It is also very closely related to the way that cooccurrence counting is typically done in Hadoop. Is your project an ongoing effort? ---------- Forwarded message ---------- From: Robin Anil <[email protected]> Date: Tue, Dec 8, 2009 at 11:09 AM Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication To: [email protected] Cc: John Norstad <[email protected]> ---------- Forwarded message ---------- From: John Norstad <[email protected]> Date: Tue, Dec 8, 2009 at 10:14 PM Subject: A MapReduce Algorithm for Matrix Multiplication To: MapReduce <[email protected]> As an exercise while learning MapReduce, I developed an algorithm for matrix multiplication and wrote it up on my web site. If you're interested, it's at: http://homepage.mac.com/j.norstad/matrix-multiply I present the algorithm in English, as pseudo-code, and as Java source code for Hadoop. -- John Norstad Academic and Research Technologies Northwestern University -- Ted Dunning, CTO DeepDyve
