Hi James, Ravi: James wrote: ... >> I have 20 "entities" for which I have ~500 measurements each. So, I >> have a matrix of 20 rows by ~500 columns. ...
Perhaps I misread James' question, but I don't think so. As James described it, we have ~500 measurements made on 20 objects. A PCA on this [20 rows/observations by ~ 500 columns/descriptors/variables] should return ~ 500 eigenvalues. And each of these columns/descriptors/variables will have a loading on each PC. James wants to reduce his descriptors/measurements/variables to the "most important" (variance). A primitive way of doing this would be to examine the loadings on the first 2--3 PCs and choose those columns/descriptors/variables with the highest loadings, and throw away the rest. [He has already decided that he can throw away all but the first two PCs.] In fact, it would be a very good idea to do a coinertia analysis on the pre- and post-selected sets, and look at the RV value. If this is above [thumbsuck] 0.9, then you're doing very well (there's a good plot method for this in ade4, cf coinertia &c). But see Cadima et al. (+refs for caution; and elsewhere) for more sophisticated methods of subsetting. Regards, Mark. Ravi Varadhan wrote: > > Mark, > > What you are referring to deals with the selection of covariates, since PC > doesn't do dimensionality reduction in the sense of covariate selection. > But what Mark is asking for is to identify how much each data point > contributes to individual PCs. I don't think that Mark's query makes much > sense, unless he meant to ask: which individuals have high/low scores on > PC1/PC2. Here are some comments that may be tangentially related to > Mark's > question: > > 1. If one is worried about a few data points contributing heavily to the > estimation of PCs, then one can use robust PCA, for example, using robust > covariance matrices. MASS has some tools for this. > 2. The "biplot" for the first 2 PCs can give some insights > 3. PCs, especially, the last few PCs, can be used to identify "outliers". > > Hope this is helpful, > Ravi. > > ---------------------------------------------------------------------------- > ------- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: [EMAIL PROTECTED] > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > ---------------------------------------------------------------------------- > -------- > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Mark Difford > Sent: Monday, July 02, 2007 1:55 PM > To: r-help@stat.math.ethz.ch > Subject: Re: [R] Question about PCA with prcomp > > > Hi James, > > Have a look at Cadima et al.'s subselect package [Cadima worked with/was a > student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes > part > of a Chapter to this question in his text (Principal Component Analysis, > pub. Springer)]. Then you should look at psychometric stuff: a good place > to start would be Professor Revelle's psych package. > > BestR, > Mark. > > > James R. Graham wrote: >> >> Hello All, >> >> The basic premise of what I want to do is the following: >> >> I have 20 "entities" for which I have ~500 measurements each. So, I >> have a matrix of 20 rows by ~500 columns. >> >> The 20 entities fall into two classes: "good" and "bad." >> >> I eventually would like to derive a model that would then be able to >> classify new entities as being in "good territory" or "bad territory" >> based upon my existing data set. >> >> I know that not all ~500 measurements are meaningful, so I thought >> the best place to begin would be to do a PCA in order to reduce the >> amount of data with which I have to work. >> >> I did this using the prcomp function and found that nearly 90% of the >> variance in the data is explained by PC1 and 2. >> >> So far, so good. >> >> I would now like to find out which of the original ~500 measurements >> contribute to PC1 and 2 and by how much. >> >> Any tips would be greatly appreciated! And apologies in advance if >> this turns out to be an idiotic question. >> >> >> james >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860 > 8 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11401504 Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.