[R] clustering with hclust

2014-07-25 Thread Marianna Bolognesi
Hi everybody, I have a problem with a cluster analysis.

I am trying to use hclust, method=ward.

The Ward method works with SQUARED Euclidean distances.

Hclust demands a dissimilarity structure as produced by dist.

Yet, dist does not seem to produce a table of squared euclidean distances,
starting from cosines.
In fact, computing manually the squared euclidean distances from cosines
(d=2(1-cos)) produces a different outcome.

As a consequence, using hclust with ward method on a table of cosines
tranformed into distances with dist, produces a different dendrogram than
other programs for hierarchical clustering with ward method (i.e.
multidendrograms). Weird right??

Computing manually the distances and then feeding them to hclust produces
an error message. So, I am wondering, what the hell is this dist function
doing?!

thanks!

marianna

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] clustering with hclust

2014-07-25 Thread Christian Hennig

Dear Marianna,

the function agnes in library cluster can compute Ward's method from a raw 
data matrix (at least this is what the help page suggests).


Also, you may not be using the most recent version of hclust. The most 
recent version has a note in its help page that states:


Two different algorithms are found in the literature for Ward clustering. 
The one used by option ward.D (equivalent to the only Ward option ward 
in R versions = 3.0.3) does not implement Ward's (1963) clustering 
criterion, whereas option ward.D2 implements that criterion (Murtagh and 
Legendre 2013). With the latter, the dissimilarities are squared before 
cluster updating. Note that agnes(*, method=ward) corresponds to 
hclust(*, ward.D2).


The Murtagh and Legendre paper has more details on this and is here:
http://arxiv.org/abs/.6285
F. Murtagh and P. Legendre, Ward's hierarchical clustering method: 
clustering criterion and agglomerative algorithm


It's not clear to me why one would want to use Ward's method for this kind 
of data, but that's your decision of course.


Best wishes,
Christian


On Fri, 25 Jul 2014, Marianna Bolognesi wrote:


Hi everybody, I have a problem with a cluster analysis.

I am trying to use hclust, method=ward.

The Ward method works with SQUARED Euclidean distances.

Hclust demands a dissimilarity structure as produced by dist.

Yet, dist does not seem to produce a table of squared euclidean distances,
starting from cosines.
In fact, computing manually the squared euclidean distances from cosines
(d=2(1-cos)) produces a different outcome.

As a consequence, using hclust with ward method on a table of cosines
tranformed into distances with dist, produces a different dendrogram than
other programs for hierarchical clustering with ward method (i.e.
multidendrograms). Weird right??

Computing manually the distances and then feeding them to hclust produces
an error message. So, I am wondering, what the hell is this dist function
doing?!

thanks!

marianna

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hen...@ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.