Re: [R] which function to use to do classification

2006-03-30 Thread Adaikalavan Ramasamy
I find it helpful to explain to my colleagues from non-mathematical
background that in classification the classes are predefined and in
clustering the classes (and sometimes the number of classes) are not.

I prefer the use of the term class discovery over clustering when
people try to cluster samples in order to derive meaningful classes.

Regards, Adai



On Wed, 2006-03-29 at 18:52 -0500, Liaw, Andy wrote:
 In addition to Brian's comment, Gordon's book, already in 2nd edition, is
 all about clustering, but the title is simply `Classification'.
 
 Andy
 
 From: Sean Davis
  
  We have to be careful here.  Classification (which is the 
  terminology that the original poster used) is NOT the same as 
  clustering, although the two are often confused.  If the 
  original poster wants to do clustering and examine the 
  results for the presence of three clusters, that is fine and 
  there are many methods for clustering that could be used.  
  However, classification will require a different set of 
  tools.  If the clustering tools already pointed out are not 
  doing what is needed (that is, that Cao actually is 
  interested in clustering and not classification), then 
  perhaps a further explanation of what the problem would help clarify.
  
  Sean
  
  
  On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:
  
   try this (suppose mat is your matrix):
   
   hc - hclust(dist(mat,manhattan), ward)
   plot(hc, hang=-1)
   (x - identify(hc)) # rightclick to stop
   cutree(hc, 3)
   
   km- kmeans(mat, 3)
   km$cluster
   km$centers
   
   pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust
   
   
   
   Baoqiang Cao a écrit :
   
   Thanks!
   I tried kmeans, the results is not very positive. Anyway, thanks 
   Jacques! Please let me know if you have any other thoughts!
   
   Best regards, 
  Baoqiang Cao
   
   === At 2006-03-29, 00:08:44 you wrote: ===
   

   
   if you want to classify rows or columns, read:
   ?hclust
   ?kmeans
   library(cluster)
   ?pam
   
   
   Baoqiang Cao a écrit :
   
  
   
   Dear All,
   
   I have a data, suppose it is an N*M matrix data. All I 
  want is to 
   classify it into, let see, 3 classes. Which method(s) do 
  you think 
   is(are) appropriate for this purpose? Any reference will be 
   welcome! Thanks!
   
   Best,
   Baoqiang Cao
   
   
   
   
  ---
   -
   
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
   

   
   .
  
   
   
   = = = = = = = = = = = = = = = = = = = =
   
   Baoqiang Cao
   [EMAIL PROTECTED]
   2006-03-29
   
   

   
   
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
  
  __
  R-help@stat.math.ethz.ch mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-29 Thread Sean Davis
We have to be careful here.  Classification (which is the terminology that
the original poster used) is NOT the same as clustering, although the two
are often confused.  If the original poster wants to do clustering and
examine the results for the presence of three clusters, that is fine and
there are many methods for clustering that could be used.  However,
classification will require a different set of tools.  If the clustering
tools already pointed out are not doing what is needed (that is, that Cao
actually is interested in clustering and not classification), then perhaps a
further explanation of what the problem would help clarify.

Sean


On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:

 try this (suppose mat is your matrix):
 
 hc - hclust(dist(mat,manhattan), ward)
 plot(hc, hang=-1)
 (x - identify(hc)) # rightclick to stop
 cutree(hc, 3)
 
 km- kmeans(mat, 3)
 km$cluster
 km$centers
 
 pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust
 
 
 
 Baoqiang Cao a écrit :
 
 Thanks!
 I tried kmeans, the results is not very positive. Anyway, thanks Jacques!
 Please let me know if you have any other thoughts!
 
 Best regards, 
Baoqiang Cao
 
 === At 2006-03-29, 00:08:44 you wrote: ===
 
  
 
 if you want to classify rows or columns, read:
 ?hclust
 ?kmeans
 library(cluster)
 ?pam
 
 
 Baoqiang Cao a écrit :
 

 
 Dear All,
 
 I have a data, suppose it is an N*M matrix data. All I want is to classify
 it into, let see, 3 classes. Which method(s) do you think is(are)
 appropriate for this purpose? Any reference will be welcome! Thanks!
 
 Best, 
 Baoqiang Cao
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
  
 
 .

 
 
 = = = = = = = = = = = = = = = = = = = =
 
 Baoqiang Cao
 [EMAIL PROTECTED]
 2006-03-29
 
 
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] which function to use to do classification

2006-03-29 Thread Prof Brian Ripley

On Wed, 29 Mar 2006, Sean Davis wrote:


We have to be careful here.  Classification (which is the terminology that
the original poster used) is NOT the same as clustering, although the two
are often confused.


Well, in one of its two English senses it is the same.  From a recent talk 
of mine (GfKL30), quoting the Concise Oxford Dictionary:


\emph{Classification} has two senses:

\begin{itemize}
\item `to arrange in classes or categories'
\item `assign (a thing) to a class or category'
\end{itemize}

There is a community (q.v. the International Federation of Classification 
Societies and Journal of Classification as well as the entry in the 
original Encyclopedia of Statistical Sciences) that meams (almost) 
entirely the first sense.


To add to this, the similar words to classification in e.g. French or 
German have (I am told) different shades of meaning.




If the original poster wants to do clustering and
examine the results for the presence of three clusters, that is fine and
there are many methods for clustering that could be used.  However,
classification will require a different set of tools.  If the clustering
tools already pointed out are not doing what is needed (that is, that Cao
actually is interested in clustering and not classification), then perhaps a
further explanation of what the problem would help clarify.


Yes, further explanation would help.


Sean


On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:


try this (suppose mat is your matrix):

hc - hclust(dist(mat,manhattan), ward)
plot(hc, hang=-1)
(x - identify(hc)) # rightclick to stop
cutree(hc, 3)

km- kmeans(mat, 3)
km$cluster
km$centers

pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust



Baoqiang Cao a écrit :


Thanks!
I tried kmeans, the results is not very positive. Anyway, thanks Jacques!
Please let me know if you have any other thoughts!

Best regards,
   Baoqiang Cao

=== At 2006-03-29, 00:08:44 you wrote: ===




if you want to classify rows or columns, read:
?hclust
?kmeans
library(cluster)
?pam


Baoqiang Cao a écrit :




Dear All,

I have a data, suppose it is an N*M matrix data. All I want is to classify
it into, let see, 3 classes. Which method(s) do you think is(are)
appropriate for this purpose? Any reference will be welcome! Thanks!

Best,
Baoqiang Cao





__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




.




= = = = = = = = = = = = = = = = = = = =

Baoqiang Cao
[EMAIL PROTECTED]
2006-03-29






__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-29 Thread Baoqiang Cao

On Wed, 29 Mar 2006, Sean Davis wrote:

 We have to be careful here.  Classification (which is the terminology that
 the original poster used) is NOT the same as clustering, although the two
 are often confused.

Well, in one of its two English senses it is the same.  From a recent talk 
of mine (GfKL30), quoting the Concise Oxford Dictionary:

\emph{Classification} has two senses:

\begin{itemize}
\item `to arrange in classes or categories'
\item `assign (a thing) to a class or category'
\end{itemize}

There is a community (q.v. the International Federation of Classification 
Societies and Journal of Classification as well as the entry in the 
original Encyclopedia of Statistical Sciences) that meams (almost) 
entirely the first sense.

To add to this, the similar words to classification in e.g. French or 
German have (I am told) different shades of meaning.


 If the original poster wants to do clustering and
 examine the results for the presence of three clusters, that is fine and
 there are many methods for clustering that could be used.  However,
 classification will require a different set of tools.  If the clustering
 tools already pointed out are not doing what is needed (that is, that Cao
 actually is interested in clustering and not classification), then perhaps a
 further explanation of what the problem would help clarify.

Yes, further explanation would help.
My intension is to arrange all the samples in classes. As a non-native English 
speaker, I should have checked the word before I actually use it to express 
myself. The quoting makes perfect sense to me. Appreciate!

Thank you Jacques and Martin, your comments and suggestion are well received!

Best,
 Baoqiang Cao


 Sean


 On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:

 try this (suppose mat is your matrix):

 hc - hclust(dist(mat,manhattan), ward)
 plot(hc, hang=-1)
 (x - identify(hc)) # rightclick to stop
 cutree(hc, 3)

 km- kmeans(mat, 3)
 km$cluster
 km$centers

 pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust



 Baoqiang Cao a �crit :

 Thanks!
 I tried kmeans, the results is not very positive. Anyway, thanks Jacques!
 Please let me know if you have any other thoughts!

 Best regards,
Baoqiang Cao

 === At 2006-03-29, 00:08:44 you wrote: ===



 if you want to classify rows or columns, read:
 ?hclust
 ?kmeans
 library(cluster)
 ?pam


 Baoqiang Cao a �crit :



 Dear All,

 I have a data, suppose it is an N*M matrix data. All I want is to 
 classify
 it into, let see, 3 classes. Which method(s) do you think is(are)
 appropriate for this purpose? Any reference will be welcome! Thanks!

 Best,
 Baoqiang Cao



 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html



 .



 = = = = = = = = = = = = = = = = = = = =

 Baoqiang Cao
 [EMAIL PROTECTED]
 2006-03-29





 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

= = = = = = = = = = = = = = = = = = = =

Baoqiang Cao
[EMAIL PROTECTED]
2006-03-29

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-29 Thread Liaw, Andy
In addition to Brian's comment, Gordon's book, already in 2nd edition, is
all about clustering, but the title is simply `Classification'.

Andy

From: Sean Davis
 
 We have to be careful here.  Classification (which is the 
 terminology that the original poster used) is NOT the same as 
 clustering, although the two are often confused.  If the 
 original poster wants to do clustering and examine the 
 results for the presence of three clusters, that is fine and 
 there are many methods for clustering that could be used.  
 However, classification will require a different set of 
 tools.  If the clustering tools already pointed out are not 
 doing what is needed (that is, that Cao actually is 
 interested in clustering and not classification), then 
 perhaps a further explanation of what the problem would help clarify.
 
 Sean
 
 
 On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:
 
  try this (suppose mat is your matrix):
  
  hc - hclust(dist(mat,manhattan), ward)
  plot(hc, hang=-1)
  (x - identify(hc)) # rightclick to stop
  cutree(hc, 3)
  
  km- kmeans(mat, 3)
  km$cluster
  km$centers
  
  pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust
  
  
  
  Baoqiang Cao a écrit :
  
  Thanks!
  I tried kmeans, the results is not very positive. Anyway, thanks 
  Jacques! Please let me know if you have any other thoughts!
  
  Best regards, 
 Baoqiang Cao
  
  === At 2006-03-29, 00:08:44 you wrote: ===
  
   
  
  if you want to classify rows or columns, read:
  ?hclust
  ?kmeans
  library(cluster)
  ?pam
  
  
  Baoqiang Cao a écrit :
  
 
  
  Dear All,
  
  I have a data, suppose it is an N*M matrix data. All I 
 want is to 
  classify it into, let see, 3 classes. Which method(s) do 
 you think 
  is(are) appropriate for this purpose? Any reference will be 
  welcome! Thanks!
  
  Best,
  Baoqiang Cao
  
  
  
  
 ---
  -
  
  __
  R-help@stat.math.ethz.ch mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
   
  
  .
 
  
  
  = = = = = = = = = = = = = = = = = = = =
  
  Baoqiang Cao
  [EMAIL PROTECTED]
  2006-03-29
  
  
   
  
  
  __
  R-help@stat.math.ethz.ch mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] which function to use to do classification

2006-03-28 Thread Baoqiang Cao
Dear All,

I have a data, suppose it is an N*M matrix data. All I want is to classify it 
into, let see, 3 classes. Which method(s) do you think is(are) appropriate for 
this purpose? Any reference will be welcome! Thanks!

Best, 
  Baoqiang Cao

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-28 Thread Jacques VESLOT
if you want to classify rows or columns, read:
?hclust
?kmeans
library(cluster)
?pam


Baoqiang Cao a écrit :

Dear All,

I have a data, suppose it is an N*M matrix data. All I want is to classify it 
into, let see, 3 classes. Which method(s) do you think is(are) appropriate for 
this purpose? Any reference will be welcome! Thanks!

Best, 
  Baoqiang Cao

  



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-28 Thread Baoqiang Cao
Thanks!
I tried kmeans, the results is not very positive. Anyway, thanks Jacques! 
Please let me know if you have any other thoughts!

Best regards, 
Baoqiang Cao

=== At 2006-03-29, 00:08:44 you wrote: ===

if you want to classify rows or columns, read:
?hclust
?kmeans
library(cluster)
?pam


Baoqiang Cao a écrit :

Dear All,

I have a data, suppose it is an N*M matrix data. All I want is to classify it 
into, let see, 3 classes. Which method(s) do you think is(are) appropriate 
for this purpose? Any reference will be welcome! Thanks!

Best, 
  Baoqiang Cao

  



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


.

= = = = = = = = = = = = = = = = = = = =

Baoqiang Cao
[EMAIL PROTECTED]
2006-03-29

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-28 Thread Jacques VESLOT
try this (suppose mat is your matrix):

hc - hclust(dist(mat,manhattan), ward)
plot(hc, hang=-1)
(x - identify(hc)) # rightclick to stop
cutree(hc, 3)

km- kmeans(mat, 3)
km$cluster
km$centers

pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust



Baoqiang Cao a écrit :

Thanks!
I tried kmeans, the results is not very positive. Anyway, thanks Jacques! 
Please let me know if you have any other thoughts!

Best regards, 
Baoqiang Cao

=== At 2006-03-29, 00:08:44 you wrote: ===

  

if you want to classify rows or columns, read:
?hclust
?kmeans
library(cluster)
?pam


Baoqiang Cao a écrit :



Dear All,

I have a data, suppose it is an N*M matrix data. All I want is to classify 
it into, let see, 3 classes. Which method(s) do you think is(are) 
appropriate for this purpose? Any reference will be welcome! Thanks!

Best, 
 Baoqiang Cao

 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

  

.



= = = = = = = = = = = = = = = = = = = =
   
Baoqiang Cao
[EMAIL PROTECTED]
2006-03-29


  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-28 Thread Martin Maechler
 Baoqiang == Baoqiang Cao [EMAIL PROTECTED]
 on Wed, 29 Mar 2006 00:46:01 -0500 writes:

Baoqiang Thanks!
Baoqiang I tried kmeans, the results is not very positive. Anyway, thanks 
Jacques! Please let me know if you have any other thoughts!

My first recommendation would have been pam(),
but Jacques mentioned that as well.

HOWEVER note that many (unfortunately nowadays even most) people doing
cluster analysis nowadays have forgotten (or never known) the
importance of the underlying similarity / dissimilarity / distance
which underlies almost all clustering methods
(see functions 'dist()' and also cluster::daisy().  
The choice of dissimilarity includes variable transformation,
selection, etc --- things which need thinking in addition to
software

If you don't get very positive results it could well be that
you should start considering the above.

Martin Maechler, ETH Zurich


Baoqiang === At 2006-03-29, 00:08:44 you wrote: ===

 if you want to classify rows or columns, read:
 ?hclust
 ?kmeans
 library(cluster)
 ?pam
 
 
 Baoqiang Cao a __
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html