Hello everyone,

I'm using the k-means example as basis for a custom implementation and I
noticed the following behavior: If during an iteration no point is assigned
to a particular cluster, this cluster will then "disappear".
This happens because SelectNearestCenter() outputs <centroidId, point>
tuples, (where centroidId is the chosen center by the point) and these are
then grouped by centroidId to compute the new centers. If no point selects
a particular centroid, this centroid will not appear in subsequent
iterations.

For example, assume we have the points
{ (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and {2,
(5, 0)}.
Initially, point (2, 0) will be assigned to centroid 1, but then after
centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned to
cluster 2.

Is this intended behavior?
This seemed odd to me, but I couldn't really find any resources that define
the "correct" behavior.. It seems that handling such a situation is
implementation-specific. I think that if we keep it this way, we might want
to add a comment in the example though :)

Cheers,
V.

Reply via email to