RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-27 Thread Weiguang Shi
Thanks very much Andy for the code and the
explanation.
The meaning of AC is much more clear now.

I did notice, when I tried the code, the results were
not exactly the same as yours.
   sapply(c(.25,.5), testAC, x=x[1:4],
method=single)
  Loading required package: cluster 
  Error in FUN(X[[1]], ...) : Object x not found
   x=rnorm(50)
   sapply(c(.25,.5), testAC, x=x[1:4],
method=single)
  [1] 0.7450599 0.9926918

   version
 _
  platform i686-pc-linux-gnu
  arch i686 
  os   linux-gnu
  system   i686, linux-gnu  
  status
  major2
  minor0.1  
  year 2004 
  month11   
  day  15 
  language R

Regards,
Weiguang

 --- Liaw, Andy [EMAIL PROTECTED] wrote: 
 It has to do with sample sizes.  Consider the
 following:
 
 testAC - function(prop1=0.5, x=rnorm(50),
 center=c(0, 100), ...) {
 stopifnot(require(cluster))
 n - length(x)
 n1 - ceiling(n * prop1)
 n2 - n - n1
 agnes(x + rep(center, c(n1, n2)), ...)$ac
 }
 
 Now some tests:
 
  sapply(c(.25, .5), testAC, x=x[1:4],
 method=single)
 [1] 0.7427591 0.9862944
  sapply(1:5 / 10, testAC, x=x[1:10],
 method=single)
 [1] 0.8977139 0.9974224 0.9950061 0.9946366
 0.9946366
  sapply(1:5 / 10, testAC, x=x, method=single)
 [1] 0.9982955 0.9969757 0.9971114 0.9971127
 0.9975111
 
 So it seems like AC does not consider isolated
 singletons as cluster
 structures.  This is only discernable in small
 sample size, though.
 
 Andy
 
 
  
   --- Liaw, Andy [EMAIL PROTECTED] wrote: 
   BTW, I checked the book.  You're not going find
 much
   more than that.
   
  Thanks for checking.
  
  Weiguang
  
 

__
   
  Post your free ad now! http://personals.yahoo.ca
  
  
 
 

--
 Notice:  This e-mail message, together with any
 attachments, contains information of Merck  Co.,
 Inc. (One Merck Drive, Whitehouse Station, New
 Jersey, USA 08889), and/or its affiliates (which may
 be known outside the United States as Merck Frosst,
 Merck Sharp  Dohme or MSD and in Japan, as Banyu)
 that may be confidential, proprietary copyrighted
 and/or legally privileged. It is intended solely for
 the use of the individual or entity named on this
 message.  If you are not the intended recipient, and
 have received this message in error, please notify
 us immediately by reply e-mail and then delete it
 from your system.

--


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-26 Thread Weiguang Shi
Thanks again Andy.

The definition of AC is understood, yet I have trouble
picturing the amount of clear clustering structure
it measures. To put things into perspective, for two
series 
   1,2,1000,1001
and 
   1,2,3,1000
agnes(x, method=single) generates ac values of 
0.998998 and 0.0.7492477 respectively, yet it seems to
me that both have fairly clear clustering structures.

 --- Liaw, Andy [EMAIL PROTECTED] wrote: 
 BTW, I checked the book.  You're not going find much
 more than that.
 
Thanks for checking.

Weiguang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-26 Thread Liaw, Andy


 -Original Message-
 From: Weiguang Shi
 
 Thanks again Andy.
 
 The definition of AC is understood, yet I have trouble
 picturing the amount of clear clustering structure
 it measures. To put things into perspective, for two
 series 
1,2,1000,1001
 and 
1,2,3,1000
 agnes(x, method=single) generates ac values of 
 0.998998 and 0.0.7492477 respectively, yet it seems to
 me that both have fairly clear clustering structures.

It has to do with sample sizes.  Consider the following:

testAC - function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) {
stopifnot(require(cluster))
n - length(x)
n1 - ceiling(n * prop1)
n2 - n - n1
agnes(x + rep(center, c(n1, n2)), ...)$ac
}

Now some tests:

 sapply(c(.25, .5), testAC, x=x[1:4], method=single)
[1] 0.7427591 0.9862944
 sapply(1:5 / 10, testAC, x=x[1:10], method=single)
[1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366
 sapply(1:5 / 10, testAC, x=x, method=single)
[1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111

So it seems like AC does not consider isolated singletons as cluster
structures.  This is only discernable in small sample size, though.

Andy


 
  --- Liaw, Andy [EMAIL PROTECTED] wrote: 
  BTW, I checked the book.  You're not going find much
  more than that.
  
 Thanks for checking.
 
 Weiguang
 
 __
  
 Post your free ad now! http://personals.yahoo.ca
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-26 Thread Mulholland, Tom
Well I am not sure that can call a single figure a cluster. Sure it's not near 
the others but how can you conceptually measure it's cluster properties. It 
seems reasonable that there has to be some form of doubt about it.

Back to that Google search hit number 3 www.stat.ncu.edu.tw/teacher/ 
hungy/mva/notes/lecture-cluster-example.pdf  gives examples which are not close 
to 1. 

It is said that The quality of an agglomerative clustering of the data can be 
measured by the agglomerative coefficient this is ascribed to Kaufman L. and 
Rousseeuw P. (1990), Finding Groups in Data, an Introduction to Cluster 
Analysis, Wiley, New York. After I had read some of the recent work on 
clustering I realised that clustering is as much art as it is anything else. 
There is a wealth of papers with arguments about which methods should be used 
to assess the effectiveness of the clustering process. I don't think it matters 
which type of evaluation method you use they are not absolute numbers, they 
need to be seen as relative. They also need to be seen as an attempt at 
modelling a method of quality assessment for which there is no clear winner. So 
the bottom line is that if for your purposes a single number on it's own should 
be classified as a group, you may well have to define your own method of 
evaluation.

Tom

 -Original Message-
 From: Weiguang Shi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, 27 January 2005 7:28 AM
 To: Liaw, Andy
 Cc: rhelp
 Subject: RE: [R] agglomerative coefficient in agnes (cluster)
 
 
 Thanks again Andy.
 
 The definition of AC is understood, yet I have trouble
 picturing the amount of clear clustering structure
 it measures. To put things into perspective, for two
 series 
1,2,1000,1001
 and 
1,2,3,1000
 agnes(x, method=single) generates ac values of 
 0.998998 and 0.0.7492477 respectively, yet it seems to
 me that both have fairly clear clustering structures.
 
  --- Liaw, Andy [EMAIL PROTECTED] wrote: 
  BTW, I checked the book.  You're not going find much
  more than that.
  
 Thanks for checking.
 
 Weiguang
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-25 Thread Liaw, Andy
Google really can be a very useful thing, in case you haven't found that.
This is the first hit I got with `agglomerative coefficient':

http://www.unesco.org/webworld/idams/advguide/Chapt7_1_4.htm

Andy

 From: Weiguang Shi
 
 I haven't read the book, but could anyone explain more
 about this parameter? 
 
 help(agnes) says that ac measures the amount of 
 clustering structure found. From the definition given
 in help(agnes.object), however, it seems that as long
 as 
 the dissimilarity of the merger in the final step of
 the
 algorithm is large enough, the ac value will be close
 to 
 1. So what does ac really mean?
 
 Thank you,
 Weiguang
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] agglomerative coefficient in agnes (cluster)

2005-01-25 Thread Weiguang Shi
Thanks Andy. Google is really useful.

But that page doesn't answer my question, does it?

I repeat: AC highly depends on the value of the
dissimilarity of the last merge.
My question: what is the use of AC?

Weiguang


 --- Liaw, Andy [EMAIL PROTECTED] wrote: 
 Google really can be a very useful thing, in case
 you haven't found that.
 This is the first hit I got with `agglomerative
 coefficient':
 

http://www.unesco.org/webworld/idams/advguide/Chapt7_1_4.htm
 
 Andy
 
  From: Weiguang Shi
  
  I haven't read the book, but could anyone explain
 more
  about this parameter? 
  
  help(agnes) says that ac measures the amount of 
  clustering structure found. From the definition
 given
  in help(agnes.object), however, it seems that as
 long
  as 
  the dissimilarity of the merger in the final step
 of
  the
  algorithm is large enough, the ac value will be
 close
  to 
  1. So what does ac really mean?
  
  Thank you,
  Weiguang
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
  
 
 

--
 Notice:  This e-mail message, together with any
 attachments, contains information of Merck  Co.,
 Inc. (One Merck Drive, Whitehouse Station, New
 Jersey, USA 08889), and/or its affiliates (which may
 be known outside the United States as Merck Frosst,
 Merck Sharp  Dohme or MSD and in Japan, as Banyu)
 that may be confidential, proprietary copyrighted
 and/or legally privileged. It is intended solely for
 the use of the individual or entity named on this
 message.  If you are not the intended recipient, and
 have received this message in error, please notify
 us immediately by reply e-mail and then delete it
 from your system.

--


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] agglomerative coefficient in agnes (cluster)

2005-01-24 Thread Weiguang Shi
I haven't read the book, but could anyone explain more
about this parameter? 

help(agnes) says that ac measures the amount of 
clustering structure found. From the definition given
in help(agnes.object), however, it seems that as long
as 
the dissimilarity of the merger in the final step of
the
algorithm is large enough, the ac value will be close
to 
1. So what does ac really mean?

Thank you,
Weiguang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html