Can't put my finger on it but something about your idea rubs me the wrong way. Maybe it's that the tree depends on the hierarchical clustering algorithm and the choice on how to trim it should be based on something more defensible than "avoid singletons". In this example Hawaii is really different than New Hampshire, why would you want them clustered together ?
But, it's your work, field of study, whatever. If you are going to do it anyway, one way would be to loop over cut heights: hc <- hclust(dist(USArrests), "ave") plot(hc) hr <- range(hc$height) tol<- diff(hr)/100 # set tolerance level for(i in seq(1e-4+hr[1],hr[2],tol)){ hcc <- rect.hclust(hc,h=i) if(all(sapply(hcc,length)>1)) break } str(hcc) # or if you prefer dendrogram dend1<- as.dendrogram(hc) for(i in seq(1e-4+hr[1],hr[2],tol)){ dend2 <- cut(dend1,h=i) if(all(sapply(dend2$lower,function(x) attr(x,'members'))>1)) break } dend2 Cheers On Thu, May 24, 2012 at 10:31 AM, <r-help.20.tre...@spamgourmet.com> wrote: > Dear R-Help, > > I have a clustering problem with hclust that I hope someone can help > me with. Consider the classic hclust example: > > hc <- hclust(dist(USArrests), "ave") > plot(hc) > > I would like to cut the tree up in such a way so as to avoid small > clusters, so that we get a minimum number of items in each cluster, > and therefore avoid singletons. e.g. in this example, you can see that > Hawaii is split off onto its own at quite a high level. I would like > to avoid having a single item clustered on its own like this. How can > I achieve this? > > I have tried manually modifying the tree using dendrapply but have not > been able to produce a valid solution thus far.. > > Suggestions are welcome. > > Best wishes, > > Mark > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.