Re: [R] Help with clustering

Christian Hennig Mon, 26 Jan 2009 05:11:34 -0800

Generally, how to scale different variables when aggregating them in adissimilarity measure is strongly dependent on the subject matter, what theaim of clustering and your "cluster comncept" is. This cannot be answeredproperly on such a mailing list.

A standard transformation before computing dissimilarities would be toscale all variables to variance 1 by dividing by their standard deviations.This gives in some well defined sense allvariables the same weight (which may be somewhat affected byoutliers, heavy tails, skewness; note, however, that normalising to the samerange shares the same problems more severly).


Regards,
Christian

On Mon, 26 Jan 2009, mau...@alice.it wrote:

I am going to try out a tentative clustering of some feature vectors.
The range of values spanned by the three items making up the features vector is 
quite different:

Item-1 goes roughly from 70 to 525 (integer numbers only)
Item-2 is in-between 0 and 1 (all real numbers between 0 and 1)
Item-3 goes from 1 to 10 (integer numbers only)

In order to spread out Item-2 even further I might try to replace Item-2 with 
Log10(Item-2).

My concern is that, regardless the distance measure used, the item whose order 
of magnitude is the highest may carry the highest weight in the process of 
calculating the similarity matrix therefore fading out the influence of the 
items with smaller variation in the resulting clusters.
Should I normalize all feature vector elements to 1 in advance of generating 
the similarity matrix ?

Thank you so much.
Maura







tutti i telefonini TIM!


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with clustering

Reply via email to