Re: [R] cluster/distance large matrix (fwd)

Thomas Lumley Thu, 11 Feb 2010 07:18:17 -0800

On Thu, 11 Feb 2010, Christian Hennig wrote:

>It is well know that hierarchical methods are problematic with too large 
>dissimilarity matrices; even if you resolve the memory problem, the number of 
>operations required is enormous.



There is at least one exception to this. Single-linkage hierarchical clustering 
with a convex distance such as Euclidean distance is feasible for quite large 
data sets using algorithms for the Euclidean minimum spanning tree. For tens to 
hundreds of thousands of points (flow cytometry data) the algorithm in the 
nnclust package is competitive in speed with model-based clustering (on a 
32-bit system).  It's slower than pam(), but it is deterministic.

This doesn't apply to the original question, of course.

     -thomas

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster/distance large matrix (fwd)

Reply via email to