Re: [R] distance in the function kmeans

2004-05-28 Thread Uwe Ligges
n.bouget wrote:
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance. 
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET
As the name says, kmeans() calculates *means* (centres) of clusters. It 
does not any make sense to do that on distances ...

Uwe Ligges

Accdez au courrier lectronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34/mn) ; tl : 08 92 68 13 50 (0,34/mn)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Christian Hennig
On Fri, 28 May 2004, Uwe Ligges wrote:

 n.bouget wrote:
 
  Hi,
  I want to know which distance is using in the function kmeans
  and if we can change this distance. 
  Indeed, in the function pam, we can put a distance matrix in
  parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
  we can't do it in the function kmeans, we have to put the
  matrix of data directly ...
  Thanks in advance,
  Nicolas BOUGET
 
 As the name says, kmeans() calculates *means* (centres) of clusters. It 
 does not any make sense to do that on distances ...
 
 Uwe Ligges

That's not really true. There is an equivalent to the k-means target
criterion in terms of distances, and that uses squared Euklidean
distances. However, as far as I know, you cannot compute it directly in
R for any other distance. Using pam is the thing which comes closest.

Christian Hennig


***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread [EMAIL PROTECTED]
 n.bouget wrote:
 
  Hi,
  I want to know which distance is using in the function kmeans
  and if we can change this distance. 
  Indeed, in the function pam, we can put a distance matrix in
  parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
  we can't do it in the function kmeans, we have to put the
  matrix of data directly ...
Yes but how can we choose the distance to calculate centers?

  Thanks in advance,
  Nicolas BOUGET
 
 As the name says, kmeans() calculates *means* (centres) of
clusters. It 
 does not any make sense to do that on distances ...
 
 Uwe Ligges
 
 
  Accédez au courrier électronique de La Poste :
www.laposte.net ; 
  3615 LAPOSTENET (0,34€/mn) ; tél : 08 92 68 13 50 (0,34€/mn)
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.
math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
 
 

Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34€/mn) ; tél : 08 92 68 13 50 (0,34€/mn)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Uwe Ligges
[EMAIL PROTECTED] wrote:
n.bouget wrote:

Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance. 
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Yes but how can we choose the distance to calculate centers?
Ah, you are going to use different distance measure (e.g. euclidean, 
manhattan, ...) as in other cluster methods? Well, that's not possible 
with the kmeans() implementation. See ?kmeans which tells you:

  The data given by x is clustered by the k-means algorithm. When this
  terminates, all cluster centres are at the mean of their Voronoi sets
  (the set of data points which are nearest to the cluster centre).
  The algorithm of Hartigan and Wong (1979) is used.
Of course, you can do some projection based on the calculation of 
distances, but I don't think there are functions available to do that 
completely automatical - and interpretation of results won't be that 
easy ...

Uwe Ligges


Thanks in advance,
Nicolas BOUGET
As the name says, kmeans() calculates *means* (centres) of
clusters. It 

does not any make sense to do that on distances ...
Uwe Ligges

Accdez au courrier lectronique de La Poste :
www.laposte.net ; 

3615 LAPOSTENET (0,34/mn) ; tl : 08 92 68 13 50 (0,34/mn)
__
[EMAIL PROTECTED] mailing list
https://www.stat.
math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Accdez au courrier lectronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34/mn) ; tl : 08 92 68 13 50 (0,34/mn)


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Martin Maechler
 n\ == n\ bouget n
 on Fri, 28 May 2004 09:37:35 +0200 writes:

n\ Hi, I want to know which distance is using in the
n\ function kmeans and if we can change this distance.
n\ Indeed, in the function pam, we can put a distance
n\ matrix in parameter (by the line
n\ pam-pam(dist(matrixdata),k=7) ) but we can't do it in
n\ the function kmeans, we have to put the matrix of data
n\ directly ...  Thanks in advance, Nicolas BOUGET

It might be interesting to look at this from the pam()
perspective:
What exactly is pam() lacking that kmeans() does for you?

Christian, are you suggesting that pam() could do the job if

1) there was a dist(., method=a la kmeans) 
2) pam() allowed to be started by a user-specified set of
 medoids instead of the Kaufman-Rousseeuw-optimal ones
?

Regards,
Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Christian Hennig
On Fri, 28 May 2004, Martin Maechler wrote:

  n\ == n\ bouget n
  on Fri, 28 May 2004 09:37:35 +0200 writes:
 
 n\ Hi, I want to know which distance is using in the
 n\ function kmeans and if we can change this distance.
 n\ Indeed, in the function pam, we can put a distance
 n\ matrix in parameter (by the line
 n\ pam-pam(dist(matrixdata),k=7) ) but we can't do it in
 n\ the function kmeans, we have to put the matrix of data
 n\ directly ...  Thanks in advance, Nicolas BOUGET
 
 It might be interesting to look at this from the pam()
 perspective:
 What exactly is pam() lacking that kmeans() does for you?
 
 Christian, are you suggesting that pam() could do the job if
 
 1) there was a dist(., method=a la kmeans) 
 2) pam() allowed to be started by a user-specified set of
medoids instead of the Kaufman-Rousseeuw-optimal ones
 ?

The k-means criterion is equivalent to:
Find a partition C=C_1 \cup...\cup C_k such that
\sum_{i=1}^k \sum_{x_j,x_l\in C_i} d(x_j,x_l)/|C_i|=min!

d is squared Euklidean distance (see the Bock book). You may wonder to 
what clustering this would lead with another distance.

The difference to pam is that pam minimizes sums of distances to centroid
objects, which have to be part of the dataset. k-means does not need
centroid objects, no mean objects are needed. Thus, pam with squared
Euklidean distances is a kind of approximation to k-means. (In practice,
both are approximations to a global optimum.)

There would also be a further version if other distances would be allowed,
the pam criterion would be optimized, but the cluster centers would be
allowed to lie elsewhere than on an object of the sample. 

Of course, pam and the original k-means are more or less easy to compute,
while the suggested alternatives may be computationally complex.

Best,
Christian


 
 Regards,
 Martin Maechler
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Thomas Petzoldt
n.bouget wrote:
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance. 
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET
One solution is to transform the data in a way, that the euclidean 
distance of the transformed values represents some other distance of the 
original values. This works at least for the Mahalanobis-Distance, when 
one applies a multivariate technique to a PCA transformed and re-scaled 
matrix, but I don't know if there are transformations for some other 
distance measures.

Thomas P.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread [EMAIL PROTECTED]
I don't exactly understand what you do, could you show me the
program that you execute to do that?

 n.bouget wrote:
  Hi,
  I want to know which distance is using in the function kmeans
  and if we can change this distance. 
  Indeed, in the function pam, we can put a distance matrix in
  parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
  we can't do it in the function kmeans, we have to put the
  matrix of data directly ...
  Thanks in advance,
  Nicolas BOUGET
 
 One solution is to transform the data in a way, that the
euclidean 
 distance of the transformed values represents some other
distance of the 
 original values. This works at least for the
Mahalanobis-Distance, when 
 one applies a multivariate technique to a PCA transformed
and re-scaled 
 matrix, but I don't know if there are transformations for
some other 
 distance measures.
 
 Thomas P.
 

Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34€/mn) ; tél : 08 92 68 13 50 (0,34€/mn)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Gavin Simpson
Thomas Petzoldt wrote:
n.bouget wrote:
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance. Indeed, in the function pam, we 
can put a distance matrix in
parameter (by the line pam-pam(dist(matrixdata),k=7) ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET

One solution is to transform the data in a way, that the euclidean 
distance of the transformed values represents some other distance of the 
original values. This works at least for the Mahalanobis-Distance, when 
one applies a multivariate technique to a PCA transformed and re-scaled 
matrix, but I don't know if there are transformations for some other 
distance measures.

Thomas P.
Other solutions from an ecological paper are:
Chord distance
Chi square metric
Chi square distance
Hellinger Distance
Distance between species profiles
All these can be seen as Euclidean distances of some transformation of 
the data.

The paper Ecologically meaningful transformations for ordination of 
species data Pierre Legendre, and Eugene D. Gallagher (2001) Oecologia 
Vol. 129, Issue 2, 271-280, explains the concept and how to do the 
transformations.

An R example is given in the help file of decostand() in Jari Oksanen's 
vegan library for two of the transformations mentioned above.

Gav
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd.  ECRC [E] [EMAIL PROTECTED]
UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Gavin Simpson
Gavin Simpson wrote:
...
An R example is given in the help file of decostand() in Jari Oksanen's 
vegan library for two of the transformations mentioned above.
^^^
Pre-empting the usual response about proper terminology, I of course 
meant package not library.

Gav
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd.  ECRC [E] [EMAIL PROTECTED]
UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distance in the function kmeans

2004-05-28 Thread Thomas Petzoldt
[EMAIL PROTECTED] wrote:
 I don't exactly understand what you do, could you show me the
 program that you execute to do that?
I did such things sometimes ago, so the following is (as usual) without
warranty. There are several methods, e.g. using Choleski factorization,
singular value decomposition or principal components. Given mdata as
original data matrix it works with hclust and should be applicable to
kmeans too:
# with svd
z - svd(scale(mdata, scale=F))$u
cl - hclust(dist(z), method=ward)
# with princomp (rescaled)
pc - princomp(mdata, cor=FALSE)
pcdata - as.data.frame(scale(pc$scores))
cl - hclust(dist(pcdata), method=ward)
... but as I mentioned, this is only an example, that methods working
with the Euclidean distance can be applied to other distance measures,
when an appropriate transformation of the data exist and, according to
Gavin, there are indeed some other possibilities.
Thomas P.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html