Hello David, Yes my variables are all numeric....I have a few questions regarding your 2 options.
Would these still be the best options if missing data was not an issue? I was told that I should be performing NMDS as it has few assumptions on the data distribution but neither of your options use this. If NMDS is not preferred and I were to perform a PCA, can you tell me why you chose prcomp()? My statistical text (Discovering Statistics Using R) explains PCA quite well using principal() in the psych package so I am just wondering the advantages of one over the other... I am overwhelmed by the number of ordination methods! Thank you, Elizabeth On Mon, May 13, 2013 at 9:44 AM, David Carlson <dcarl...@tamu.edu> wrote: > First. Do not use html messages. They are converted to plain text and your > table ends up a mess. See below. It appears the variables are all numeric? > If so, there are two standard approaches to handling multiple scales and > magnitudes with cluster analysis: > > 1. Use z-scores. The scale() function will convert each variable into a > standard score with a mean of 0 and a standard deviation of 1. Then use > Euclidean distance in the dist() function which will adjust for your > missing > values. > > 2. Use prcomp() on the correlation matrix of the variables to extract a set > of principal components and use the principal component scores in the > cluster analysis. This may allow you to reduce the number of variables in > the data set if the 29 variables are correlated with one another. > > ------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > From: Elizabeth Beck [mailto:elizabethbe...@gmail.com] > Sent: Friday, May 10, 2013 1:20 PM > To: dcarl...@tamu.edu > Cc: r-help@r-project.org > Subject: Re: [R] NMDS with missing data? > > Hi David, > > You are right in that Bray-Curtis is not suitable for my dataset, and that > my variables are very different. Given your suggestions, I am struggling > with how to transform or standardize my data given that they vary so much. > Additionally, looking at the dist() package I am not sure which distance > measure would be most appropriate. Euclidean seems to most widely used but > I'm not sure if it is appropriate for myself (there much more help for > ecology data than toxicology). Given a sample of my data below ( total of > 287 obs. of 29 variables) can you suggest a starting point? > > SODIUM > K > CL > HCO3 > ANION > CA > P > GLUCOSE > CHOLEST > GGT > GLDH > CK > AST > PROTEIN > ALBUMIN > GLOBULIN > A_G > UA > BA > CORTICO > T3 > T4 > THYROID > 145 > 3.3 > 102 > 24 > 22 > 2.9 > 2.45 > 9.8 > 5.7 > 3 > 3 > 678 > 5 > 34 > 15 > 19 > 0.79 > 180 > 6 > 70.97 > 1.31 > 12.77 > 0.102376 > 146 > 3.2 > 102 > 21 > 26 > 2.89 > 2.68 > 11.1 > 6.78 > 3 > 4 > 1290 > 9 > 36 > 18 > 18 > 1 > 170 > 13 > 79.1 > 3.51 > 18.78 > 0.186751 > 147 > 2.5 > 103 > 22 > 25 > 2.96 > 2.59 > 10 > 5.78 > 3 > 6 > 1582 > 11 > 35 > 17 > 18 > 0.94 > 272 > 10 > 65.84 > 1.84 > 15.5 > 0.118602 > 148 > 2.5 > 101 > 21 > 29 > 2.91 > 2.91 > 10.6 > 5.83 > 3 > 3 > 1479 > 8 > 35 > 17 > 18 > 0.94 > 317 > 8 > 74.9 > 2.59 > 20.68 > 0.125389 > > Thank you! > Elizabeth > > On Thu, May 9, 2013 at 7:50 AM, David Carlson <dcarl...@tamu.edu> wrote: > Since you pass your entire data.frame to metaMDS(), your first error > probably comes from the fact that you have included ID as one of the > variables. You should look at the results of > > str(dat) > > You can drop cases with missing values using > > > dat2 <- na.omit(dat) > > metaMDS(dat2[,-1]) > > would run the analysis on all but the first column (ID) with all the cases > containing complete data. But that assumes that sex and exposure are not > factors. > > Or you could use one of the distance functions in dist() which adjust for > missing values. However dist() does not have an option to use Bray-Curtis > (the default in metaMDS()). Bray-Curtis is designed for comparing species > counts or proportions so it is not clear that it is an appropriate > dissimilarity measure for your data. Further, your data seem contain a > mixture of measurement scales and/or magnitudes so some variable > standardization or transformations are probably necessary before you can > get > any useful results from MDS. > > ------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On > Behalf Of Elizabeth Beck > Sent: Wednesday, May 8, 2013 3:39 PM > To: r-help@r-project.org > Subject: [R] NMDS with missing data? > > Hi, > I'm trying to run NMDS (non-metric multidimensional scaling) with R vegan > (metaMDS) but I have a few NAs in my data set. I've tried to run it 2 ways. > > The first way with my entire data set which includes variables such as ID, > sex, exposure, treatment, sodium, potassium, chloride.... > > mydata.mds<-metaMDS(dat) > > I get the following error: > > in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { : > missing value where TRUE/FALSE needed > In addition: Warning messages: > 1: In Ops.factor(left, right) : < not meaningful for factors > 2: In Ops.factor(left, right) : < not meaningful for factors > 3: In Ops.factor(left, right) : < not meaningful for factors > 4: In Ops.factor(left, right) : < not meaningful for factors > 5: In Ops.factor(left, right) : < not meaningful for factors > > The second way with only those last biochemical variables (29 in total). > > mydata.mds<-metaMDS(measurements) > > I get this error: > > Error in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { > : > missing value where TRUE/FALSE needed > > My go to "na.rm=TRUE" does nothing. Any ideas on how to account for NAs and > if so which of the above options I should be using? > Thanks! > Elizabeth > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.