Since I posted last night I've been exploring the possibilities of how the two implementations could be different. The underlying algorithm appears to be slightly but I think the main difference between the two is how the initial seeds are chosen. In FASTCLUS I believe it's some sort of random selection of values, while in kmeans it's a random selection of actual observations. The big difference in results I was getting with a particular data set was that FASTCLUS was producing one huge, dense cluster (about 85% of obs) and a bunch of very small ones while kmeans() was producing a much more even distribution of cluster membership.
Because the data I was using actually seems to contain a very large, homogeneous group that occupies a very small space, a random selection of seeds (FASTCLUS) is very unlikely to have more than one seed planted inside the large (#obs), dense cluster which will have the effect of breaking it apart during iterations. Kmeans() on the other hand is using a random selection of observations each of which will have a high probability of coming from the large, dense cluster therefore multiple seeds will most likely be planted in that area causing it to break-up during the iterations. At least that's my take on it, does anyone see anything wrong with line of reasoning? Andy On Fri, Dec 3, 2010 at 10:15 AM, Georg Ruß <resea...@georgruss.de> wrote: > On 02/12/10 17:49:37, Andrew Agrimson wrote: > > I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS > > and I'm getting drastically different results with a real life data set. > > [...] Has anybody looked into the differences in the implementations or > > have any thoughts on the matter? > > Hi Andrew, > > as per the website below, it looks as if PROC FASTCLUS is implementing a > certain flavor of k-Means: > > http://www.technion.ac.il/docs/sas/stat/chap27/sect2.htm > > As per the manpage ?kmeans, the R implementation of k-Means has the option > to set one of the algorithms explicitly: > > algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")) > > I don't know whether you've tried that, but you may start by setting these > algorithm variants explicitly and see what the outcome is. > > Regards, > Georg. > -- > Research Assistant > Otto-von-Guericke-Universität Magdeburg > resea...@georgruss.de > http://research.georgruss.de > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.