On Wed, 18 Oct 2006, Weiwei Shi wrote:

> Dear Chris:
>
> I tried to use cor+1 but it still gives me sil width < 0 for average.

Well, then it seems that the clustering is not that good.
I don't know your data and there is no theoretical reason why it has to 
be positive. You should read the Kaufman and Rousseeuw book to understand 
the average silhouette width better.

Best wishes,
Christian

>
>> set.seed(1000)
>> t9 <- cor(t(x), method="pearson")+1 # here i add 1
>> t8 <- as.dist(t9)
>> t7 <- cutree(hclust(t8), 4)
>> cluster.stats(t8, t7)$avg.silwidth
> [1] -0.008750826
>> set.seed(1000)
>> t9 <- cor(t(x), method="pearson") # here I did not add 1
>> t8 <- as.dist(t9)
>> t7 <- cutree(hclust(t8), 4)
>> cluster.stats(t8, t7)$avg.silwidth
> [1] -0.09543089
>
> On 10/18/06, Weiwei Shi <[EMAIL PROTECTED]> wrote:
>> Dear Chris:
>> 
>> thanks for the prompt reply!
>> 
>> You are right, dist from pearson has negatives there, which I should
>> use cor+1 in my case (since negatively correlated genes should be
>> considered farthest). Thanks.
>> 
>> as to the ?cluster.stats, I double-checked it and I found I need to
>> restart my JGR, until then the help page function starts to accept
>> newly loaded package, like fpc for this case.
>> 
>> sorry for the confusion,
>> 
>> weiwei
>> 
>> On 10/18/06, Christian Hennig <[EMAIL PROTECTED]> wrote:
>> > Dear Weiwei,
>> >
>> > > btw, ?cluster.stats does not work on my Mac machine.
>> > >> version
>> > >              _
>> > > platform       i386-apple-darwin8.6.1
>> > > arch           i386
>> > > os             darwin8.6.1
>> > > system         i386, darwin8.6.1
>> > > status
>> > > major          2
>> > > minor          3.1
>> > > year           2006
>> > > month          06
>> > > day            01
>> > > svn rev        38247
>> > > language       R
>> > > version.string Version 2.3.1 (2006-06-01)
>> >
>> > Because I don't have access to a Mac, I can't tell you anything about
>> > this, unfortunately.
>> > I always thought that my package should work on all platforms if it 
>> passes
>> > all the standard tests for packages?
>> > (Is there anyone else who could comment on this please?)
>> >
>> > > I have a sample like this
>> > >> dim(dd.df)
>> > > [1] 142  28
>> > >
>> > > and I want to cluster rows;
>> > > first of all, I followed the examples for cluster.stats() by
>> > > d.dd <- dist(dd.df) # use Euclidean
>> > > d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
>> > > cluster.stats(d.dd, d.4) # gives me some results like this:
>> > >
>> > > $cluster.size
>> > > [1] 133   5   2   2
>> > >
>> > > $avg.silwidth
>> > > [1] 0.9857916
>> > >
>> > > but when I tried to use pearson dist here, by visualization, i think 4
>> > > or 5 clusters are good for pearson dist, but it gave me a very bad
>> > > avg.siqlwidth
>> > >
>> > > d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
>> > > $cluster.size
>> > > [1] 86 31  6 19
>> > >
>> > > $avg.silwidth
>> > > [1] -0.09543089
>> >
>> > cor can give negative values, which doesn't fit the usual definition
>> > of a distance. I don't know what as.dist does in this case, but I think
>> > that, depending on your application, you should rather use the absolute
>> > value of the correlation, or 1+cor.
>> >
>> > > btw, what's $seperation? where can I find the detailed explanation on
>> > > the output from cluster.stats?
>> >
>> > This is documented on the cluster.stats help page:
>> >
>> > separation: vector of clusterwise minimum distances of a point in the
>> >            cluster to a point of another cluster.
>> >
>> > Best regards,
>> > Christian
>> >
>> >
>> > *** --- ***
>> > Christian Hennig
>> > University College London, Department of Statistical Science
>> > Gower St., London WC1E 6BT, phone +44 207 679 1698
>> > [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche
>> >
>> 
>> 
>> --
>> Weiwei Shi, Ph.D
>> Research Scientist
>> GeneGO, Inc.
>> 
>> "Did you always know?"
>> "No, I did not. But I believed..."
>> ---Matrix III
>> 
>
>
> -- 
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to