Re: Canopy Clustering on BSP

Thomas Jungblut Tue, 10 Apr 2012 00:01:15 -0700

There are algorithms that have very few supersteps, see the Matrix-Vector
Multiplication in GSoC this year.
It makes sense, since global sync is very expensive.


However, Canopy clustering does not fit very well, since there is a
parallel part and a sequencial part.
So MapReduce is a good fit for canopy clustering.

Am 7. April 2012 15:19 schrieb Praveen Sripati <[email protected]>:

> Hi,
>
> After Thomas implementation of K-Means (3) I was motivated to extend it
> using the Canopy clustering. So, I started looking at the MR implementation
> of Canopy (1) and (2). The MR implementation of Canopy clustering is done
> in two MR phases, first one to identify the canopies and second to assign
> canopies to the data points. I don't see much improvement when this is done
> using BSP. Please correct me if I am wrong.
>
> Also, are there any algorithms which can implemented easily (for those who
> are getting started with Hama/BSP like me) on Hama/BSP where we could also
> see some performance improvements when compared to the MR implementation. I
> have seen Mahout and there are many algorithms implemented in it and would
> like to see something similar in Hama also.
>
> Thanks,
> Praveen
>
> (1) -
> http://horicky.blogspot.in/2011/04/k-means-clustering-in-map-reduce.html
> (2) - https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering
> (3) -
>
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Canopy Clustering on BSP

Reply via email to