> labels = np.unique(labels, return_index=True)[1][labels] I would personally +1 on this for clustering methods that naturally support it like KMeans
Alex On Fri, May 25, 2012 at 9:02 AM, Gael Varoquaux <[email protected]> wrote: > Hi list, > > A lot of clustering algorithms can be initiated randomely and thus on the > same data give different results because of the non-convexity of the > criterion. > > One trivial source of non-reproducibility is the fact that labels can be > permuted: even if the algorithm find the same clusters, it may give > different labels to these. This renders testing and exploration harder, > but its easy to fix. > > Indeed, if we use as a convention that as we consider training samples in > the ordering in which they are given, cluster labels are found in an > ordered way, all we need to do is to add the following line at the end of > the fit: > > labels = np.unique(labels, return_index=True)[1][labels] > > provided that the labels id are not used elsewhere, of course. > > I'd like to do this for kmeans, and maybe a few other algorithms where it > is really easy to do. What do people think? > > Gael > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
