RE: [R] agglomerative coefficient in agnes (cluster)
Thanks very much Andy for the code and the explanation. The meaning of AC is much more clear now. I did notice, when I tried the code, the results were not exactly the same as yours. sapply(c(.25,.5), testAC, x=x[1:4], method=single) Loading required package: cluster Error in FUN(X[[1]], ...) : Object x not found x=rnorm(50) sapply(c(.25,.5), testAC, x=x[1:4], method=single) [1] 0.7450599 0.9926918 version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major2 minor0.1 year 2004 month11 day 15 language R Regards, Weiguang --- Liaw, Andy [EMAIL PROTECTED] wrote: It has to do with sample sizes. Consider the following: testAC - function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) { stopifnot(require(cluster)) n - length(x) n1 - ceiling(n * prop1) n2 - n - n1 agnes(x + rep(center, c(n1, n2)), ...)$ac } Now some tests: sapply(c(.25, .5), testAC, x=x[1:4], method=single) [1] 0.7427591 0.9862944 sapply(1:5 / 10, testAC, x=x[1:10], method=single) [1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366 sapply(1:5 / 10, testAC, x=x, method=single) [1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111 So it seems like AC does not consider isolated singletons as cluster structures. This is only discernable in small sample size, though. Andy --- Liaw, Andy [EMAIL PROTECTED] wrote: BTW, I checked the book. You're not going find much more than that. Thanks for checking. Weiguang __ Post your free ad now! http://personals.yahoo.ca -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] agglomerative coefficient in agnes (cluster)
Thanks again Andy. The definition of AC is understood, yet I have trouble picturing the amount of clear clustering structure it measures. To put things into perspective, for two series 1,2,1000,1001 and 1,2,3,1000 agnes(x, method=single) generates ac values of 0.998998 and 0.0.7492477 respectively, yet it seems to me that both have fairly clear clustering structures. --- Liaw, Andy [EMAIL PROTECTED] wrote: BTW, I checked the book. You're not going find much more than that. Thanks for checking. Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] agglomerative coefficient in agnes (cluster)
-Original Message- From: Weiguang Shi Thanks again Andy. The definition of AC is understood, yet I have trouble picturing the amount of clear clustering structure it measures. To put things into perspective, for two series 1,2,1000,1001 and 1,2,3,1000 agnes(x, method=single) generates ac values of 0.998998 and 0.0.7492477 respectively, yet it seems to me that both have fairly clear clustering structures. It has to do with sample sizes. Consider the following: testAC - function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) { stopifnot(require(cluster)) n - length(x) n1 - ceiling(n * prop1) n2 - n - n1 agnes(x + rep(center, c(n1, n2)), ...)$ac } Now some tests: sapply(c(.25, .5), testAC, x=x[1:4], method=single) [1] 0.7427591 0.9862944 sapply(1:5 / 10, testAC, x=x[1:10], method=single) [1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366 sapply(1:5 / 10, testAC, x=x, method=single) [1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111 So it seems like AC does not consider isolated singletons as cluster structures. This is only discernable in small sample size, though. Andy --- Liaw, Andy [EMAIL PROTECTED] wrote: BTW, I checked the book. You're not going find much more than that. Thanks for checking. Weiguang __ Post your free ad now! http://personals.yahoo.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] agglomerative coefficient in agnes (cluster)
Well I am not sure that can call a single figure a cluster. Sure it's not near the others but how can you conceptually measure it's cluster properties. It seems reasonable that there has to be some form of doubt about it. Back to that Google search hit number 3 www.stat.ncu.edu.tw/teacher/ hungy/mva/notes/lecture-cluster-example.pdf gives examples which are not close to 1. It is said that The quality of an agglomerative clustering of the data can be measured by the agglomerative coefficient this is ascribed to Kaufman L. and Rousseeuw P. (1990), Finding Groups in Data, an Introduction to Cluster Analysis, Wiley, New York. After I had read some of the recent work on clustering I realised that clustering is as much art as it is anything else. There is a wealth of papers with arguments about which methods should be used to assess the effectiveness of the clustering process. I don't think it matters which type of evaluation method you use they are not absolute numbers, they need to be seen as relative. They also need to be seen as an attempt at modelling a method of quality assessment for which there is no clear winner. So the bottom line is that if for your purposes a single number on it's own should be classified as a group, you may well have to define your own method of evaluation. Tom -Original Message- From: Weiguang Shi [mailto:[EMAIL PROTECTED] Sent: Thursday, 27 January 2005 7:28 AM To: Liaw, Andy Cc: rhelp Subject: RE: [R] agglomerative coefficient in agnes (cluster) Thanks again Andy. The definition of AC is understood, yet I have trouble picturing the amount of clear clustering structure it measures. To put things into perspective, for two series 1,2,1000,1001 and 1,2,3,1000 agnes(x, method=single) generates ac values of 0.998998 and 0.0.7492477 respectively, yet it seems to me that both have fairly clear clustering structures. --- Liaw, Andy [EMAIL PROTECTED] wrote: BTW, I checked the book. You're not going find much more than that. Thanks for checking. Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] agglomerative coefficient in agnes (cluster)
Google really can be a very useful thing, in case you haven't found that. This is the first hit I got with `agglomerative coefficient': http://www.unesco.org/webworld/idams/advguide/Chapt7_1_4.htm Andy From: Weiguang Shi I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank you, Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] agglomerative coefficient in agnes (cluster)
Thanks Andy. Google is really useful. But that page doesn't answer my question, does it? I repeat: AC highly depends on the value of the dissimilarity of the last merge. My question: what is the use of AC? Weiguang --- Liaw, Andy [EMAIL PROTECTED] wrote: Google really can be a very useful thing, in case you haven't found that. This is the first hit I got with `agglomerative coefficient': http://www.unesco.org/webworld/idams/advguide/Chapt7_1_4.htm Andy From: Weiguang Shi I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank you, Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] agglomerative coefficient in agnes (cluster)
I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank you, Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html