I agree with you. However, for clarification purposes, do you know why in this extreme case, false positive rate (where class 0 is much bigger than class 1) might be pretty high if not 1?
Thank you, ________________________________________ From: Andy [[email protected]] Sent: Wednesday, November 05, 2014 1:58 PM To: [email protected] Subject: Re: [Scikit-learn-general] k-means with unbalanced clusters On 11/05/2014 01:10 AM, Sturla Molden wrote: > "Pagliari, Roberto" <[email protected]> > wrote: > >> If that's the case, why is that the underlying implementation of k-means >> does not take this into account? > Because then it would be the "classification EM algorithm" (often called > CEM) instead of k-means. By definition, k-means is CEM constrained with > equal cluster size and equal and spherical covariance matrices. > If you want different sized clusters, you might want to look into GMMs, which learn a covariance structure. In KMeans, the cluster structure is always given by the voronoi cells of the means, which means that the border between the clusters is exactly in the middle of the two clusters centers. ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
