> It makes sense, since global sync is very expensive. For any iterative algorithm like k-means, the # of global syncs is proportional the # of iterations. So, how does k-means fit in BSP?
Also, why is global sync very expensive? Is it because all the processes have to enter the Barrier synchronisation before the next super step starts? Praveen On Tue, Apr 10, 2012 at 12:30 PM, Thomas Jungblut < [email protected]> wrote: > There are algorithms that have very few supersteps, see the Matrix-Vector > Multiplication in GSoC this year. > It makes sense, since global sync is very expensive. > > However, Canopy clustering does not fit very well, since there is a > parallel part and a sequencial part. > So MapReduce is a good fit for canopy clustering. > > Am 7. April 2012 15:19 schrieb Praveen Sripati <[email protected]>: > > > Hi, > > > > After Thomas implementation of K-Means (3) I was motivated to extend it > > using the Canopy clustering. So, I started looking at the MR > implementation > > of Canopy (1) and (2). The MR implementation of Canopy clustering is done > > in two MR phases, first one to identify the canopies and second to assign > > canopies to the data points. I don't see much improvement when this is > done > > using BSP. Please correct me if I am wrong. > > > > Also, are there any algorithms which can implemented easily (for those > who > > are getting started with Hama/BSP like me) on Hama/BSP where we could > also > > see some performance improvements when compared to the MR > implementation. I > > have seen Mahout and there are many algorithms implemented in it and > would > > like to see something similar in Hama also. > > > > Thanks, > > Praveen > > > > (1) - > > http://horicky.blogspot.in/2011/04/k-means-clustering-in-map-reduce.html > > (2) - > https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering > > (3) - > > > > > http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html > > > > > > -- > Thomas Jungblut > Berlin <[email protected]> >
