Re: [R] which function to use to do classification
I find it helpful to explain to my colleagues from non-mathematical background that in classification the classes are predefined and in clustering the classes (and sometimes the number of classes) are not. I prefer the use of the term class discovery over clustering when people try to cluster samples in order to derive meaningful classes. Regards, Adai On Wed, 2006-03-29 at 18:52 -0500, Liaw, Andy wrote: In addition to Brian's comment, Gordon's book, already in 2nd edition, is all about clustering, but the title is simply `Classification'. Andy From: Sean Davis We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao --- - __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
On Wed, 29 Mar 2006, Sean Davis wrote: We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. Well, in one of its two English senses it is the same. From a recent talk of mine (GfKL30), quoting the Concise Oxford Dictionary: \emph{Classification} has two senses: \begin{itemize} \item `to arrange in classes or categories' \item `assign (a thing) to a class or category' \end{itemize} There is a community (q.v. the International Federation of Classification Societies and Journal of Classification as well as the entry in the original Encyclopedia of Statistical Sciences) that meams (almost) entirely the first sense. To add to this, the similar words to classification in e.g. French or German have (I am told) different shades of meaning. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Yes, further explanation would help. Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
On Wed, 29 Mar 2006, Sean Davis wrote: We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. Well, in one of its two English senses it is the same. From a recent talk of mine (GfKL30), quoting the Concise Oxford Dictionary: \emph{Classification} has two senses: \begin{itemize} \item `to arrange in classes or categories' \item `assign (a thing) to a class or category' \end{itemize} There is a community (q.v. the International Federation of Classification Societies and Journal of Classification as well as the entry in the original Encyclopedia of Statistical Sciences) that meams (almost) entirely the first sense. To add to this, the similar words to classification in e.g. French or German have (I am told) different shades of meaning. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Yes, further explanation would help. My intension is to arrange all the samples in classes. As a non-native English speaker, I should have checked the word before I actually use it to express myself. The quoting makes perfect sense to me. Appreciate! Thank you Jacques and Martin, your comments and suggestion are well received! Best, Baoqiang Cao Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a �crit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a �crit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
In addition to Brian's comment, Gordon's book, already in 2nd edition, is all about clustering, but the title is simply `Classification'. Andy From: Sean Davis We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao --- - __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] which function to use to do classification
Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
Baoqiang == Baoqiang Cao [EMAIL PROTECTED] on Wed, 29 Mar 2006 00:46:01 -0500 writes: Baoqiang Thanks! Baoqiang I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! My first recommendation would have been pam(), but Jacques mentioned that as well. HOWEVER note that many (unfortunately nowadays even most) people doing cluster analysis nowadays have forgotten (or never known) the importance of the underlying similarity / dissimilarity / distance which underlies almost all clustering methods (see functions 'dist()' and also cluster::daisy(). The choice of dissimilarity includes variable transformation, selection, etc --- things which need thinking in addition to software If you don't get very positive results it could well be that you should start considering the above. Martin Maechler, ETH Zurich Baoqiang === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html