Re: [R] estimate the number of clusters

2003-06-10 Thread Martin Maechler
 MM == Martin Maechler [EMAIL PROTECTED]
 on Tue, 10 Jun 2003 18:12:36 +0200 writes:

MM Ping, you found another bug in silhouette.default() --
MM which can happen when there's one cluster with exactly
MM one observation.

MM I'll let you know more, once I have a complete fix.

The patch for this bug  {against an *installed* version of cluster}
is this :
---

--- cluster-version-1.7-2/library/cluster/R/cluster Thu Jun  5 04:00:15 
2003
+++ fixed/cluster/R/cluster Tue Jun 10 18:56:17 
2003
@@ -2019,11 +2019,11 @@
 wds[iC, cluster] - j
 a.i - if(Nj  1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # length(a.i)= Nj
 ## minimal distances to points in all other clusters:
-diC - rbind(apply(dmatrix[!iC, iC], 2,
+diC - rbind(apply(dmatrix[!iC, iC, drop = FALSE], 2,
function(r) tapply(r, x[!iC], mean)))# (k-1) x Nj
 minC - max.col(-t(diC))
 wds[iC,neighbor] - clid[-j][minC]
-b.i - diC[cbind(minC, seq(minC))]
+b.i - diC[cbind(minC, seq(along = minC))]
 s.i - (b.i - a.i) / pmax(b.i, a.i)
 wds[iC,sil_width] - s.i
 }

---

i.e. you add  , drop = FALSE in line 2022
 and  along =in line 2026
in the appropriate places.

A fixed version of cluster should appear soon, and also together
with R 1.7.1.

Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] estimate the number of clusters

2003-06-09 Thread ge yreyt
Dear All,
 
I am using Silhouette to estimate the number of clusters in a microarray
dataset.
 
Initially, I used the iris data to test my piece of code as follows:
 
library(cluster)
data(iris)
mydata-iris[,1:4]
maxk-15# at most 15 clusters
myindex-rep(0,maxk)  # hold the si values for each k clusters
mdist-1-cor(t(mydata)) #dissimlarity
mdist-as.dist(mdist)
for(k in 2:maxk)
{ 
 hc-diana(mdist,diss =TRUE, stand = FALSE) 
 si-silhouette.default(cutree(as.hclust(hc),k=k),mdist)
 myindex-summary(si)$avg.width
}
myk-rev(order(myindex))[1]  #select the number of k clusters with the 
   #largest si value
 
I met the following problems:
 
 for(k in 2:maxk)
+ { 
+  hc-diana(mdist,diss =TRUE, stand = FALSE) 
+  si-silhouette.default(cutree(as.hclust(hc),k=k),mdist)
+  myindex-summary(si)$avg.width
+ }
Error in [-(*tmp*, iC, sil_width, value = s.i) : 
number of items to replace is not a multiple of replacement length
In addition: Warning messages: 
1: longer object length
is not a multiple of shorter object length in: b.i - a.i 
2: number of rows of result
is not a multiple of vector length (arg 2) in: cbind(mmm, as.vector(each)) 

Could any one help me how I can solve the problems???
 
Your kind help is highly appreciated!!
 
ping

 



-
Post your free ad now! Yahoo! Canada Personals

[[alternate HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help