...but with 500 variables and only 20 'entities' (observations) you will have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how many is "a few"?
Everyone who has responded to this seems to accept the idea that PCA is the way to go here, but that is not clear to me at all. There is a 2-sample structure in the 20 observations that you have. If you simply ignore that in doing your PCA you are making strong assumptions about sampling that would seem to me unlikely to be met. If you allow for the structure and project orthogonal to it then you are probably throwing the baby out with the bathwater - you want to choose variables which maximise separation between the 2 samples (and now you are up to 482 zero principal variances, if that matters...). I think this problem probably needs a bit of a re-think. Some variant on singular LDA, for example, may be a more useful way to think about it. Bill Venables. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Monday, 2 July 2007 1:29 PM To: 'Patrick Connolly' Cc: r-help@stat.math.ethz.ch; 'Mark Difford' Subject: Re: [R] Question about PCA with prcomp The PCs that are associated with the smaller eigenvalues. ------------------------------------------------------------------------ ---- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ------------------------------------------------------------------------ ---- -------- -----Original Message----- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: |> Mark, |> |> What you are referring to deals with the selection of covariates, |> since PC |> doesn't do dimensionality reduction in the sense of covariate selection. |> But what Mark is asking for is to identify how much each data point |> contributes to individual PCs. I don't think that Mark's query makes much |> sense, unless he meant to ask: which individuals have high/low scores |> on PC1/PC2. Here are some comments that may be tangentially related |> to Mark's |> question: |> |> 1. If one is worried about a few data points contributing heavily to |> the estimation of PCs, then one can use robust PCA, for example, |> using robust covariance matrices. MASS has some tools for this. |> 2. The "biplot" for the first 2 PCs can give some insights 3. PCs, |> especially, the last few PCs, can be used to identify "outliers". What is meant by "last few PCs"? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Middle minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.