Re: [R] NMDS with missing data?

Elizabeth Beck Mon, 17 Jun 2013 10:04:34 -0700

Hello David,

Yes my variables are all numeric....I have a few questions regarding your 2
options.


Would these still be the best options if missing data was not an issue? I
was told that I should be performing NMDS as it has few assumptions on the
data distribution but neither of your options use this.

If NMDS is not preferred and I were to perform a PCA, can you tell me why
you chose prcomp()? My statistical text (Discovering Statistics Using R)
explains PCA quite well using principal() in the psych package so I am just
wondering the advantages of one over the other... I am overwhelmed by the
number of ordination methods!

Thank you,
Elizabeth

On Mon, May 13, 2013 at 9:44 AM, David Carlson <dcarl...@tamu.edu> wrote:

> First. Do not use html messages. They are converted to plain text and your
> table ends up a mess. See below. It appears the variables are all numeric?
> If so, there are two standard approaches to handling multiple scales and
> magnitudes with cluster analysis:
>
> 1. Use z-scores. The scale() function will convert each variable into a
> standard score with a mean of 0 and a standard deviation of 1. Then use
> Euclidean distance in the dist() function which will adjust for your
> missing
> values.
>
> 2. Use prcomp() on the correlation matrix of the variables to extract a set
> of principal components and use the principal component scores in the
> cluster analysis. This may allow you to reduce the number of variables in
> the data set if the 29 variables are correlated with one another.
>
> -------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> From: Elizabeth Beck [mailto:elizabethbe...@gmail.com]
> Sent: Friday, May 10, 2013 1:20 PM
> To: dcarl...@tamu.edu
> Cc: r-help@r-project.org
> Subject: Re: [R] NMDS with missing data?
>
> Hi David,
>
> You are right in that Bray-Curtis is not suitable for my dataset, and that
> my variables are very different. Given your suggestions, I am struggling
> with how to transform or standardize my data given that they vary so much.
> Additionally, looking at the dist() package I am not sure which distance
> measure would be most appropriate. Euclidean seems to most widely used but
> I'm not sure if it is appropriate for myself (there much more help for
> ecology data than toxicology). Given a sample of my data below ( total of
> 287 obs. of  29 variables) can you suggest a starting point?
>
> SODIUM
> K
> CL
> HCO3
> ANION
> CA
> P
> GLUCOSE
>  CHOLEST
>        GGT
>    GLDH
> CK
> AST
> PROTEIN
> ALBUMIN
> GLOBULIN
> A_G
> UA
> BA
> CORTICO
> T3
> T4
> THYROID
> 145
> 3.3
> 102
> 24
> 22
> 2.9
> 2.45
> 9.8
> 5.7
> 3
> 3
> 678
> 5
> 34
> 15
> 19
> 0.79
> 180
> 6
> 70.97
> 1.31
> 12.77
> 0.102376
> 146
> 3.2
> 102
> 21
> 26
> 2.89
> 2.68
> 11.1
> 6.78
> 3
> 4
> 1290
> 9
> 36
> 18
> 18
> 1
> 170
> 13
> 79.1
> 3.51
> 18.78
> 0.186751
> 147
> 2.5
> 103
> 22
> 25
> 2.96
> 2.59
> 10
> 5.78
> 3
> 6
> 1582
> 11
> 35
> 17
> 18
> 0.94
> 272
> 10
> 65.84
> 1.84
> 15.5
> 0.118602
> 148
> 2.5
> 101
> 21
> 29
> 2.91
> 2.91
> 10.6
> 5.83
> 3
> 3
> 1479
> 8
> 35
> 17
> 18
> 0.94
> 317
> 8
> 74.9
> 2.59
> 20.68
> 0.125389
>
> Thank you!
> Elizabeth
>
> On Thu, May 9, 2013 at 7:50 AM, David Carlson <dcarl...@tamu.edu> wrote:
> Since you pass your entire data.frame to metaMDS(), your first error
> probably comes from the fact that you have included ID as one of the
> variables. You should look at the results of
>
> str(dat)
>
> You can drop cases with missing values using
>
> > dat2 <- na.omit(dat)
> > metaMDS(dat2[,-1])
>
> would run the analysis on all but the first column (ID) with all the cases
> containing complete data. But that assumes that sex and exposure are not
> factors.
>
> Or you could use one of the distance functions in dist() which adjust for
> missing values. However dist() does not have an option to use Bray-Curtis
> (the default in metaMDS()). Bray-Curtis is designed for comparing species
> counts or proportions so it is not clear that it is an appropriate
> dissimilarity measure for your data. Further, your data seem contain a
> mixture of measurement scales and/or magnitudes so some variable
> standardization or transformations are probably necessary before you can
> get
> any useful results from MDS.
>
> -------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Elizabeth Beck
> Sent: Wednesday, May 8, 2013 3:39 PM
> To: r-help@r-project.org
> Subject: [R] NMDS with missing data?
>
> Hi,
> I'm trying to run NMDS (non-metric multidimensional scaling) with R vegan
> (metaMDS) but I have a few NAs in my data set. I've tried to run it 2 ways.
>
> The first way with my entire data set which includes variables such as ID,
> sex, exposure, treatment, sodium, potassium, chloride....
>
> mydata.mds<-metaMDS(dat)
>
> I get the following error:
>
>  in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { :
>   missing value where TRUE/FALSE needed
> In addition: Warning messages:
> 1: In Ops.factor(left, right) : < not meaningful for factors
> 2: In Ops.factor(left, right) : < not meaningful for factors
> 3: In Ops.factor(left, right) : < not meaningful for factors
> 4: In Ops.factor(left, right) : < not meaningful for factors
> 5: In Ops.factor(left, right) : < not meaningful for factors
>
> The second way with only those last biochemical variables (29 in total).
>
> mydata.mds<-metaMDS(measurements)
>
> I get this error:
>
> Error in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) {
> :
>   missing value where TRUE/FALSE needed
>
> My go to "na.rm=TRUE" does nothing. Any ideas on how to account for NAs and
> if so which of the above options I should be using?
> Thanks!
> Elizabeth
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NMDS with missing data?

Reply via email to