Hello everyone, I'm using the k-means example as basis for a custom implementation and I noticed the following behavior: If during an iteration no point is assigned to a particular cluster, this cluster will then "disappear". This happens because SelectNearestCenter() outputs <centroidId, point> tuples, (where centroidId is the chosen center by the point) and these are then grouped by centroidId to compute the new centers. If no point selects a particular centroid, this centroid will not appear in subsequent iterations.
For example, assume we have the points { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and {2, (5, 0)}. Initially, point (2, 0) will be assigned to centroid 1, but then after centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned to cluster 2. Is this intended behavior? This seemed odd to me, but I couldn't really find any resources that define the "correct" behavior.. It seems that handling such a situation is implementation-specific. I think that if we keep it this way, we might want to add a comment in the example though :) Cheers, V.