I am looking at large dense matrix multiplication as an example problem 
for a class of middleware.  I am also interested in sparse matrices, but 
am taking things one step at a time.

There is a paper in IEEE CloudCom '10 about Hama, including a matrix 
multiplication technique.  It is essentially the same as what is called 
"technique 4" in the 2009 monograph by John Norstad cited early in this 
thread.  Which means that, despite the fact that Hama touts the virtues of 
BSP (a position with which I am very sympathetic), this technique doesn't 
really take advantage of the extra features that BSP has over MapReduce. 
Note also that this technique creates intermediate data of much greater 
volume than the input.  For example, if each matrix is stored as an NxN 
grid of blocks, the intermediate data (the blocks paired up, awaiting 
multiplication) is a factor of N larger than the input.  I have heard 
people saying that N may be rather larger than sqrt(number of machines) 
because in some circumstances N has to be chosen before the number of 
available machines is known and you want to be able to divide the NxN load 
among your machines rather evenly.  Even if N is like sqrt(number of 
machines) this is still an unwelcome amount of bloat.  In comparison, the 
SUMMA technique does matrix multiplication but its intermediate data 
volume is no greater than the input.

Thanks,
Mike

Reply via email to