On Thu, 11 Feb 2010, Christian Hennig wrote: >It is well know that hierarchical methods are problematic with too large >dissimilarity matrices; even if you resolve the memory problem, the number of >operations required is enormous.
There is at least one exception to this. Single-linkage hierarchical clustering with a convex distance such as Euclidean distance is feasible for quite large data sets using algorithms for the Euclidean minimum spanning tree. For tens to hundreds of thousands of points (flow cytometry data) the algorithm in the nnclust package is competitive in speed with model-based clustering (on a 32-bit system). It's slower than pam(), but it is deterministic. This doesn't apply to the original question, of course. -thomas ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.