On Wed, Apr 21, 2010 at 09:59:51AM +0200, Hans Ekbrand wrote: [...]
> head(clust.geo.test.tree$height, 70) > [1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [21] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [31] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [41] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [51] 0.000000 0.000000 0.000000 0.000000 3.160631 18.963676 > 30.398644 32.232351 37.927539 44.987446 > [61] 50.065192 81.542472 82.691738 93.553729 95.971207 105.325405 > 115.218371 119.540239 125.235381 130.181302 > > As I understand this, the 54 zeroes represent identical coordinates. > The positive numbers represent the distance in meters between points > that have been grouped together at a certain level of the tree. Now, I > am not interested in grouping together points with distances larger > than 100 meters, so I would like to stop the clustering process at > that point - or, after the hclust has completed, extract the clusters > that were in effect at that level. In the above example that would be > at level 65. I found cutree(), and understood the "h" parameter of cutree, and then it all worked out. Here's an example for the archives. # Clustering max.distance.in.same.cluster <- 100 print(load(url("http://sociologi.cjb.net/temp/clust.geo.test.RData"))) clust.geo.test.tree <- hclust(dist(clust.geo.t...@coords)) my.cluster <- cutree(clust.geo.test.tree, h = max.distance.in.same.cluster) # Which clusters have more than one member? sort(unique(my.cluster[which(duplicated(my.cluster))])) # How many members do these cluster have? sapply(sort(unique(my.cluster[which(duplicated(my.cluster))])), function(x) {length(which(my.cluster == x))}) # Print a sorted list of the longest distances within each of these clusters. sort(sapply(sort(unique(my.cluster[which(duplicated(my.cluster))])), function(x) {max(dist(clust.geo.t...@coords[which(my.cluster == x),]))})) Thanks again, Roger, for the pointer to hclust()
signature.asc
Description: Digital signature
_______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo