Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You
didn't give us any information about how you numerically
transformed the categorical variables, but the usual approach
is to create indicator variables that code presence/absence
for each category within a categorical variable. Different
variances between variables can be reduced by standardizing
the variables.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical
variables? R
packages? Which is the best?

If a data has both numeric and categorical data, what is the
best clustering algorithm
to use and R package?

I tried numeric transformation of all categorical fields  and
doing clustering afterwards. But the transformed fields have
values from 1...10, and my other fields is in a bigger scale:
10000-...This will make the categorical fields has less effect
on the distance calculation...

Thank you!
Yan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to