On 04/13/05 11:36, Chris Bergstresser wrote: Hi all -- I've got a large dataset which consists of a bunch of different scales, and I'm preparing to perform a cluster analysis. I need to normalize the data so I can calculate the difference matrix. First, I didn't see a function in R which does normalization -- did I miss it? What's the best way to do it?
Look at scale(). Might be what you mean. Second, what's the best way to deal with missing values? Obviously, I could just set them to 0 (the mean of the normalized scales), but I'm not sure that's the best way. Lots of ways to deal with missing data. The ones I've found most helpful are in the Hmisc library, particularly transcan() and aregImpute(). See http://www.psych.upenn.edu/~baron/rpsych/rpsych.html#SECTION000715000000000000000 for an example of the latter. But, in general, the "right" way to deal with missing data depends on the assumptions you make. As a novice, I found the following article to be helpful: Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron R search page: http://finzi.psych.upenn.edu/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html