Each cluster will take storage space as sum of bits 1 in Or() of all of
its points. and the nearest cluster to a point is the one that its size
does not change after including that point. So Or() of bits of points of
a cluster can be a representative of that cluster.
It seems that I need a kind of subclass of AbstractCluster that keeps
track of this Or() bit vector.
On 7/16/2012 11:22 PM, Sean Owen wrote:
Hmm is that going to give a point that acts like a centroid though, that is
a "mid point" under some distance metric? I don't think you want to do this.
On Tue, Jul 17, 2012 at 5:04 AM, Masoud Moshref Javadi <[email protected]>wrote:
I want to run kmeans on binary data and the definition of centroid for my
application is the Or() of bits of all points inside a cluster.
Where, in Mahout, should I change?
--
Masoud Moshref Javadi
Computer Engineering PhD Student
Ming Hsieh Department of Electrical Engineering
University of Southern California
--
Masoud Moshref Javadi
Computer Engineering PhD Student
Ming Hsieh Department of Electrical Engineering
University of Southern California