> anyone had any thoughts on statistical measurements > to determine a "reasonable" number of clusters
Some clustering algorithms have their own quality metrics, such as silhouette width or agglomerative coefficient. BIC, AIC, MDL and MML are general quality metrics that penalize complexity in accordance with the ideas of information theory. However, these criteria may not be appropriate if you are not viewing your problem as an example of data compression. Finally, you may employ cross-validation as described in http://citeseer.nj.nec.com/smyth96clustering.html -- mag. Aleks Jakulin http://ai.fri.uni-lj.si/aleks/ Artificial Intelligence Laboratory, Faculty of Computer and Information Science, University of Ljubljana. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
