Re: [R] pca vs. pfa: dimension reduction
At 18:22 25/03/2009, Jonathan Baron wrote: On 03/25/09 19:06, soeren.vo...@eawag.ch wrote: > Can't make sense of calculated results and hope I'll find help here. > > I've collected answers from about 600 persons concerning three > variables. I hypothesise those three variables to be components (or > indicators) of one latent factor. In order to reduce data (vars), I > had the following idea: Calculate the factor underlying these three > vars. Use the loadings and the original var values to construct an new > (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets > for readability). Use ArtVar for further analysis of the data, that > is, as predictor etc. > > In my (I realise, elementary) psychological statistics readings I was > taught to use pca for these problems. Referring to Venables & Ripley > (2002, chapter 11), I applied "princomp" to my vars. But the outcome > shows 4 components -- which is obviously not what I want. Reading > further I found "factanal", which produces loadings on the one > specified factor very fine. But since this is a contradiction to > theoretical introductions in so many texts I'm completely confused > whether I'm right with these calculations. Perhaps I am missing something here but how do you get four components with three variables? > > (1) Is there an easy example, which explains the differences between > pca and pfa? (2) Which R procedure should I use to get what I want? Possibly what you want is the first principal component, which the weighted sum that accounts for the most variance of the three variables. It does essentially what you say in your first paragraph. So you want something like p1 <- princomp(cbind(X1,X2,X3),scores=TRUE) p1$scores[,1] The trouble with factanal is that it does a rotation, and the default is varimax. The first factor will usually not be the same as the first principal component (I think). Perhaps there is another rotation option that will give you this, but why bother even to look? (I didn't, obviously.) Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Michael Dewey http://www.aghmed.fsnet.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pca vs. pfa: dimension reduction
Dear Sören, Mark, and Jon, At 12:51 PM -0700 3/25/09, Mark Difford wrote: Hi Sören, (1) Is there an easy example, which explains the differences between pca and pfa? (2) Which R procedure should I use to get what I want? There are a number of fundamental differences between PCA and FA (Factor Analysis), which unfortunately are quite widely ignored. FA is explicitly model-based, whereas PCA does not invoke an explicit model. FA is also designed to detect structure, whereas PCA focuses on variance, to put things simply. In more detail, the two methods "attack" the covariance matrix in different ways: in PCA the focus of decomposition is on the diagonal elements, whereas in FA the focus is on the off-diagonal elements. This is nicely put. Less concisely, see pages 139-149 of my (under development) book on psychometric theory using R (http://personality-project.org/r/book/Chapter6.pdf) In particular, on page 149: "Although on the surface, the component model and factor model appear to very similar (compare Tables 6.6 and 6.7), they are in fact very different. One example of this is when an additional variable is added to the correlation matrix (Table 6.8). In this case, two additional variables are added to the correlation matrix. The factor pattern does not change, but the component pattern does. Why is this? Because the components are aimed at accounting for all of the variance of the matrix, adding new variables increases the amount of variance to be explained and changes the previous estimates. But the common part of the variables (that which is estimated by factors) is not sensitive to the presence (or absence) of other variables. Although a fundamental difference between the two models, this problem of the additional variable is most obvious when there are not very many variables and becomes less of an empirical problem as the number of variables increases." Take a look at Prof. Revelle's psych package (funtion omega &c). Note also that factanal has a rotation = "none" option. Regards, Mark. soeren.vogel wrote: Can't make sense of calculated results and hope I'll find help here. I've collected answers from about 600 persons concerning three variables. I hypothesise those three variables to be components (or indicators) of one latent factor. In order to reduce data (vars), I had the following idea: Calculate the factor underlying these three vars. Use the loadings and the original var values to construct an new (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets for readability). Use ArtVar for further analysis of the data, that > is, as predictor etc. For 3 variables, there is only one factor possible, so rotation is not a problem. (For 1 factor, there are 3 unknown factor loadings and 3 known correlations. The model is just identified. ) > In my (I realise, elementary) psychological statistics readings I was taught to use pca for these problems. Referring to Venables & Ripley (2002, chapter 11), I applied "princomp" to my vars. But the outcome shows 4 components -- which is obviously not what I want. Reading further I found "factanal", which produces loadings on the one specified factor very fine. But since this is a contradiction to theoretical introductions in so many texts I'm completely confused whether I'm right with these calculations. If you want to think of what these variables have in common, use factor analysis, if you want to summarize them all most efficiently with one composite, use principal components. These are very different models. As Mark said, the difference is that FA accounts for the covariances (the off diagonal elements) which reflect what the variables have in common. PCS accounts for the entire matrix, which in a 3 x3 problem, is primarily the diagonal variances. Let me know if you need more information. Bill > (1) Is there an easy example, which explains the differences between pca and pfa? (2) Which R procedure should I use to get what I want? > Thank you for your help Sören Refs.: Venables, W. N., and Ripley, B. D. (2002). Modern applied statistics with S (4th edition). New York: Springer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/pca-vs.-pfa%3A-dimension-reduction-tp22707926p22709481.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- William Revelle http:
Re: [R] pca vs. pfa: dimension reduction
Hi Sören, >> (1) Is there an easy example, which explains the differences between >> pca and pfa? (2) Which R procedure should I use to get what I want? There are a number of fundamental differences between PCA and FA (Factor Analysis), which unfortunately are quite widely ignored. FA is explicitly model-based, whereas PCA does not invoke an explicit model. FA is also designed to detect structure, whereas PCA focuses on variance, to put things simply. In more detail, the two methods "attack" the covariance matrix in different ways: in PCA the focus of decomposition is on the diagonal elements, whereas in FA the focus is on the off-diagonal elements. Take a look at Prof. Revelle's psych package (funtion omega &c). Note also that factanal has a rotation = "none" option. Regards, Mark. soeren.vogel wrote: > > Can't make sense of calculated results and hope I'll find help here. > > I've collected answers from about 600 persons concerning three > variables. I hypothesise those three variables to be components (or > indicators) of one latent factor. In order to reduce data (vars), I > had the following idea: Calculate the factor underlying these three > vars. Use the loadings and the original var values to construct an new > (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets > for readability). Use ArtVar for further analysis of the data, that > is, as predictor etc. > > In my (I realise, elementary) psychological statistics readings I was > taught to use pca for these problems. Referring to Venables & Ripley > (2002, chapter 11), I applied "princomp" to my vars. But the outcome > shows 4 components -- which is obviously not what I want. Reading > further I found "factanal", which produces loadings on the one > specified factor very fine. But since this is a contradiction to > theoretical introductions in so many texts I'm completely confused > whether I'm right with these calculations. > > (1) Is there an easy example, which explains the differences between > pca and pfa? (2) Which R procedure should I use to get what I want? > > Thank you for your help > > Sören > > > Refs.: > > Venables, W. N., and Ripley, B. D. (2002). Modern applied statistics > with S (4th edition). New York: Springer. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/pca-vs.-pfa%3A-dimension-reduction-tp22707926p22709481.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pca vs. pfa: dimension reduction
On 03/25/09 19:06, soeren.vo...@eawag.ch wrote: > Can't make sense of calculated results and hope I'll find help here. > > I've collected answers from about 600 persons concerning three > variables. I hypothesise those three variables to be components (or > indicators) of one latent factor. In order to reduce data (vars), I > had the following idea: Calculate the factor underlying these three > vars. Use the loadings and the original var values to construct an new > (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets > for readability). Use ArtVar for further analysis of the data, that > is, as predictor etc. > > In my (I realise, elementary) psychological statistics readings I was > taught to use pca for these problems. Referring to Venables & Ripley > (2002, chapter 11), I applied "princomp" to my vars. But the outcome > shows 4 components -- which is obviously not what I want. Reading > further I found "factanal", which produces loadings on the one > specified factor very fine. But since this is a contradiction to > theoretical introductions in so many texts I'm completely confused > whether I'm right with these calculations. > > (1) Is there an easy example, which explains the differences between > pca and pfa? (2) Which R procedure should I use to get what I want? Possibly what you want is the first principal component, which the weighted sum that accounts for the most variance of the three variables. It does essentially what you say in your first paragraph. So you want something like p1 <- princomp(cbind(X1,X2,X3),scores=TRUE) p1$scores[,1] The trouble with factanal is that it does a rotation, and the default is varimax. The first factor will usually not be the same as the first principal component (I think). Perhaps there is another rotation option that will give you this, but why bother even to look? (I didn't, obviously.) Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pca vs. pfa: dimension reduction
Can't make sense of calculated results and hope I'll find help here. I've collected answers from about 600 persons concerning three variables. I hypothesise those three variables to be components (or indicators) of one latent factor. In order to reduce data (vars), I had the following idea: Calculate the factor underlying these three vars. Use the loadings and the original var values to construct an new (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets for readability). Use ArtVar for further analysis of the data, that is, as predictor etc. In my (I realise, elementary) psychological statistics readings I was taught to use pca for these problems. Referring to Venables & Ripley (2002, chapter 11), I applied "princomp" to my vars. But the outcome shows 4 components -- which is obviously not what I want. Reading further I found "factanal", which produces loadings on the one specified factor very fine. But since this is a contradiction to theoretical introductions in so many texts I'm completely confused whether I'm right with these calculations. (1) Is there an easy example, which explains the differences between pca and pfa? (2) Which R procedure should I use to get what I want? Thank you for your help Sören Refs.: Venables, W. N., and Ripley, B. D. (2002). Modern applied statistics with S (4th edition). New York: Springer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.