Re: [R] cluster analysis and supervised classification: an alternative to knn1?

Christian Hennig Thu, 27 May 2010 04:10:31 -0700

Dear abanero,

In principle, k nearest neighbours classification can be computed onany dissimilarity matrix. Unfortunately, knn and knn1 seem to assumeEuclidean vectors as input, which restricts their use.

I'd probably compute an appropriate dissimilarity between points (have alook at Gower's distance in daisy, package cluster), and the implementnearest neighbours classification myself if I needed it. It should bepretty straightforward to implement.

If you want unsupervised classification (clustering) instead, you have thechoice between all kinds of dissimilarity based algorithms then (hclust, pam,agnes etc.)


Christian

On Thu, 27 May 2010, Ulrich Bodenhofer wrote:


abanero wrote:


Do you know  something like “knn1” that works with categorical variables
too?

Do you have any suggestion?

There are surely plenty of clustering algorithms around that do not require
a vector space structure on the inputs (like KNN does). I think
agglomerative clustering would solve the problem as well as a kernel-based
clustering (assuming that you have a way to positive semi-definite measure
of the similarity of two samples). Probably the simplest way is Affinity
Propagation (http://www.psi.toronto.edu/index.php?q=affinity%20propagation;
see CRAN package "apcluster" I have co-developed). All you need is a way of
measuring the similarity of samples which is straightforward both for
numerical and categorical variables - as well as for mixtures of both (the
choice of the similarity measures and how to aggregate the different
variables is left to you, of course). Your final "classification" task can
be accomplished simply by assigning the new sample to the cluster whose
exemplar is most similar.

Joris Meys wrote:


Not a direct answer, but from your description it looks like you are
better
of with supervised classification algorithms instead of unsupervised

clustering.

If you say that this is a purely supervised task that can be solved without
clustering, I disagree. abanero does not mention any class labels. So it
seems to me that it is indeed necessary to do unsupervised clustering first.
However, I agree that the second task of assigning new samples to
clusters/classes/whatever can also be solved by almost any supervised
technique if samples are labeled according to their cluster membership
first.

Cheers, Ulrich
--
View this message in context: 
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232902.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

Reply via email to