There are algorithms that have very few supersteps, see the Matrix-Vector Multiplication in GSoC this year. It makes sense, since global sync is very expensive.
However, Canopy clustering does not fit very well, since there is a parallel part and a sequencial part. So MapReduce is a good fit for canopy clustering. Am 7. April 2012 15:19 schrieb Praveen Sripati <[email protected]>: > Hi, > > After Thomas implementation of K-Means (3) I was motivated to extend it > using the Canopy clustering. So, I started looking at the MR implementation > of Canopy (1) and (2). The MR implementation of Canopy clustering is done > in two MR phases, first one to identify the canopies and second to assign > canopies to the data points. I don't see much improvement when this is done > using BSP. Please correct me if I am wrong. > > Also, are there any algorithms which can implemented easily (for those who > are getting started with Hama/BSP like me) on Hama/BSP where we could also > see some performance improvements when compared to the MR implementation. I > have seen Mahout and there are many algorithms implemented in it and would > like to see something similar in Hama also. > > Thanks, > Praveen > > (1) - > http://horicky.blogspot.in/2011/04/k-means-clustering-in-map-reduce.html > (2) - https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering > (3) - > > http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html > -- Thomas Jungblut Berlin <[email protected]>
