All,

I'm relatively new to using R, having used it thus far for some simple
statistics and plotting. However, I'm not new to programming by any
measure.

I've been looking at the various modules available for clustering,
factor analysis, etc. and find that I need advice on which modules I
should be focusing on and their application.

I have a data set comprised of columns of both quantitative and
qualitative / non-numeric attributes. I would like to perform two
operations on this data: identify correlations between attributes,
and cluster the records by attribute.

All of the clustering algorithms that I've looked at so far are based
on numerical distance functions, and it's not clear to me how I'd
apply them to qualitative attributes. It's not appropriate to simple
convert discrete qualitative attributes (e.g., native language) to
numerical values or independent columns with binary values. Is there a
module that provides such an algorithm or that can be adapted to this
purpose?

I can wrap my head around the problem of looking for cross-correlation
between the attributes, but would appreciate any insight in how to
do it most efficiently and present the results.

Thank you.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to