Re: [R] Looking for categorization method/module in R

2009-12-15 Thread David Winsemius


On Dec 15, 2009, at 7:19 AM, James Mcininch wrote:


All,

I'm relatively new to using R, having used it thus far for some simple
statistics and plotting. However, I'm not new to programming by any
measure.

I've been looking at the various modules available for clustering,
factor analysis, etc. and find that I need advice on which modules I
should be focusing on and their application.


The list is not really advertised as offering general statistical  
advice, but is more responsive to focussed questions on R use. There  
is the option of reviewing the Task Views:

http://cran.r-project.org/web/views/




I have a data set comprised of columns of both quantitative and
qualitative / non-numeric attributes. I would like to perform two
operations on this data: identify correlations between attributes,
and cluster the records by attribute.

All of the clustering algorithms that I've looked at so far are based
on numerical distance functions, and it's not clear to me how I'd
apply them to qualitative attributes. It's not appropriate to simple
convert discrete qualitative attributes (e.g., native language) to
numerical values or independent columns with binary values. Is there a
module that provides such an algorithm or that can be adapted to this
purpose?

I can wrap my head around the problem of looking for cross-correlation
between the attributes, but would appreciate any insight in how to
do it most efficiently and present the results.

Thank you.


[[alternative HTML version deleted]]


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looking for categorization method/module in R

2009-12-15 Thread James Mcininch
All,

I'm relatively new to using R, having used it thus far for some simple
statistics and plotting. However, I'm not new to programming by any
measure.

I've been looking at the various modules available for clustering,
factor analysis, etc. and find that I need advice on which modules I
should be focusing on and their application.

I have a data set comprised of columns of both quantitative and
qualitative / non-numeric attributes. I would like to perform two
operations on this data: identify correlations between attributes,
and cluster the records by attribute.

All of the clustering algorithms that I've looked at so far are based
on numerical distance functions, and it's not clear to me how I'd
apply them to qualitative attributes. It's not appropriate to simple
convert discrete qualitative attributes (e.g., native language) to
numerical values or independent columns with binary values. Is there a
module that provides such an algorithm or that can be adapted to this
purpose?

I can wrap my head around the problem of looking for cross-correlation
between the attributes, but would appreciate any insight in how to
do it most efficiently and present the results.

Thank you.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looking for categorization method/module in R

2009-12-15 Thread James Mcininch
All,

I'm relatively new to using R, having used it thus far for some simple
statistics and plotting. However, I'm not new to programming by any
measure.

I've been looking at the various modules available for clustering,
factor analysis, etc. and find that I need advice on which modules I
should be focusing on and their application.

I have a data set comprised of columns of both quantitative and
qualitative / non-numeric attributes. I would like to perform two
operations on this data: identify correlations between attributes,
and cluster the records by attribute.

All of the clustering algorithms that I've looked at so far are based
on numerical distance functions, and it's not clear to me how I'd
apply them to qualitative attributes. It's not appropriate to simple
convert discrete qualitative attributes (e.g., native language) to
numerical values or independent columns with binary values. Is there a
module that provides such an algorithm or that can be adapted to this
purpose?

I can wrap my head around the problem of looking for cross-correlation
between the attributes, but would appreciate any insight in how to
do it most efficiently and present the results.

Thank you.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looking for categorization method/module in R

2009-12-11 Thread James Mcininch
All,

I'm relatively new to using R, having used it thus far for some simple
statistics and plotting. However, I'm not new to programming by any
measure.

I've been looking at the various modules available for clustering,
factor analysis, etc. and find that I need advice on which modules I
should be focusing on and their application.

I have a data set comprised of columns of both quantitative and
qualitative / non-numeric attributes. I would like to perform two
operations on this data: identify correlations between attributes,
and cluster the records by attribute.

All of the clustering algorithms that I've looked at so far are based
on numerical distance functions, and it's not clear to me how I'd
apply them to qualitative attributes. It's not appropriate to simple
convert discrete qualitative attributes (e.g., native language) to
numerical values or independent columns with binary values. Is there a
module that provides such an algorithm or that can be adapted to this
purpose?

I can wrap my head around the problem of looking for cross-correlation
between the attributes, but would appreciate any insight in how to
do it most efficiently and present the results.

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.