Re: [R] pca vs. pfa: dimension reduction

2009-03-27 Thread Michael Dewey

At 18:22 25/03/2009, Jonathan Baron wrote:

On 03/25/09 19:06, soeren.vo...@eawag.ch wrote:
 Can't make sense of calculated results and hope I'll find help here.

 I've collected answers from about 600 persons concerning three
 variables. I hypothesise those three variables to be components (or
 indicators) of one latent factor. In order to reduce data (vars), I
 had the following idea: Calculate the factor underlying these three
 vars. Use the loadings and the original var values to construct an new
 (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets
 for readability). Use ArtVar for further analysis of the data, that
 is, as predictor etc.

 In my (I realise, elementary) psychological statistics readings I was
 taught to use pca for these problems. Referring to Venables  Ripley
 (2002, chapter 11), I applied princomp to my vars. But the outcome
 shows 4 components -- which is obviously not what I want. Reading
 further I found factanal, which produces loadings on the one
 specified factor very fine. But since this is a contradiction to
 theoretical introductions in so many texts I'm completely confused
 whether I'm right with these calculations.


Perhaps I am missing something here but how do you get four 
components with three variables?




 (1) Is there an easy example, which explains the differences between
 pca and pfa? (2) Which R procedure should I use to get what I want?

Possibly what you want is the first principal component, which the
weighted sum that accounts for the most variance of the three
variables.  It does essentially what you say in your first paragraph.
So you want something like

p1 - princomp(cbind(X1,X2,X3),scores=TRUE)
p1$scores[,1]

The trouble with factanal is that it does a rotation, and the default
is varimax.  The first factor will usually not be the same as the
first principal component (I think).  Perhaps there is another
rotation option that will give you this, but why bother even to look?
(I didn't, obviously.)

Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron


Michael Dewey
http://www.aghmed.fsnet.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pca vs. pfa: dimension reduction

2009-03-25 Thread Jonathan Baron
On 03/25/09 19:06, soeren.vo...@eawag.ch wrote:
 Can't make sense of calculated results and hope I'll find help here.
 
 I've collected answers from about 600 persons concerning three  
 variables. I hypothesise those three variables to be components (or  
 indicators) of one latent factor. In order to reduce data (vars), I  
 had the following idea: Calculate the factor underlying these three  
 vars. Use the loadings and the original var values to construct an new  
 (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets  
 for readability). Use ArtVar for further analysis of the data, that  
 is, as predictor etc.
 
 In my (I realise, elementary) psychological statistics readings I was  
 taught to use pca for these problems. Referring to Venables  Ripley  
 (2002, chapter 11), I applied princomp to my vars. But the outcome  
 shows 4 components -- which is obviously not what I want. Reading  
 further I found factanal, which produces loadings on the one  
 specified factor very fine. But since this is a contradiction to  
 theoretical introductions in so many texts I'm completely confused  
 whether I'm right with these calculations.
 
 (1) Is there an easy example, which explains the differences between  
 pca and pfa? (2) Which R procedure should I use to get what I want?

Possibly what you want is the first principal component, which the
weighted sum that accounts for the most variance of the three
variables.  It does essentially what you say in your first paragraph.
So you want something like

p1 - princomp(cbind(X1,X2,X3),scores=TRUE)
p1$scores[,1]

The trouble with factanal is that it does a rotation, and the default
is varimax.  The first factor will usually not be the same as the
first principal component (I think).  Perhaps there is another
rotation option that will give you this, but why bother even to look?
(I didn't, obviously.)

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pca vs. pfa: dimension reduction

2009-03-25 Thread Mark Difford

Hi Sören,

 (1) Is there an easy example, which explains the differences between  
 pca and pfa? (2) Which R procedure should I use to get what I want?

There are a number of fundamental differences between PCA and FA (Factor
Analysis), which unfortunately are quite widely ignored. FA is explicitly
model-based, whereas PCA does not invoke an explicit model. FA is also
designed to detect structure, whereas PCA focuses on variance, to put things
simply. In more detail, the two methods attack the covariance matrix in
different ways: in PCA the focus of decomposition is on the diagonal
elements, whereas in FA the focus is on the off-diagonal elements.

Take a look at Prof. Revelle's psych package (funtion omega c). Note also
that factanal has a rotation = none option.

Regards, Mark.


soeren.vogel wrote:
 
 Can't make sense of calculated results and hope I'll find help here.
 
 I've collected answers from about 600 persons concerning three  
 variables. I hypothesise those three variables to be components (or  
 indicators) of one latent factor. In order to reduce data (vars), I  
 had the following idea: Calculate the factor underlying these three  
 vars. Use the loadings and the original var values to construct an new  
 (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets  
 for readability). Use ArtVar for further analysis of the data, that  
 is, as predictor etc.
 
 In my (I realise, elementary) psychological statistics readings I was  
 taught to use pca for these problems. Referring to Venables  Ripley  
 (2002, chapter 11), I applied princomp to my vars. But the outcome  
 shows 4 components -- which is obviously not what I want. Reading  
 further I found factanal, which produces loadings on the one  
 specified factor very fine. But since this is a contradiction to  
 theoretical introductions in so many texts I'm completely confused  
 whether I'm right with these calculations.
 
 (1) Is there an easy example, which explains the differences between  
 pca and pfa? (2) Which R procedure should I use to get what I want?
 
 Thank you for your help
 
 Sören
 
 
 Refs.:
 
 Venables, W. N., and Ripley, B. D. (2002). Modern applied statistics  
 with S (4th edition). New York: Springer.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/pca-vs.-pfa%3A-dimension-reduction-tp22707926p22709481.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pca vs. pfa: dimension reduction

2009-03-25 Thread William Revelle

Dear  Sören, Mark, and Jon,

At 12:51 PM -0700 3/25/09, Mark Difford wrote:

Hi Sören,

 (1) Is there an easy example, which explains the differences between 
 pca and pfa? (2) Which R procedure should I use to get what I want?


There are a number of fundamental differences between PCA and FA (Factor
Analysis), which unfortunately are quite widely ignored. FA is explicitly
model-based, whereas PCA does not invoke an explicit model. FA is also
designed to detect structure, whereas PCA focuses on variance, to put things
simply. In more detail, the two methods attack the covariance matrix in
different ways: in PCA the focus of decomposition is on the diagonal
elements, whereas in FA the focus is on the off-diagonal elements.


This is nicely put.  Less concisely, see pages 
139-149 of my (under development)
book on psychometric theory using R 
(http://personality-project.org/r/book/Chapter6.pdf)

In particular, on page 149:

Although on the surface, the component model and 
factor model appear to very similar
(compare Tables 6.6 and 6.7), they are in fact 
very different. One example of this is when an
additional variable is added to the correlation 
matrix (Table 6.8). In this case, two additional
variables are added to the correlation matrix. 
The factor pattern does not change, but the
component pattern does. Why is this? Because the 
components are aimed at accounting for
all of the variance of the matrix, adding new 
variables increases the amount of variance to be
explained and changes the previous estimates. But 
the common part of the variables (that
which is estimated by factors) is not sensitive 
to the presence (or absence) of other variables.
Although a fundamental difference between the two 
models, this problem of the additional
variable is most obvious when there are not very 
many variables and becomes less of an

empirical problem as the number of variables increases.



Take a look at Prof. Revelle's psych package (funtion omega c). Note also
that factanal has a rotation = none option.

Regards, Mark.


soeren.vogel wrote:


 Can't make sense of calculated results and hope I'll find help here.

 I've collected answers from about 600 persons concerning three 
 variables. I hypothesise those three variables to be components (or 
 indicators) of one latent factor. In order to reduce data (vars), I 
 had the following idea: Calculate the factor underlying these three 
 vars. Use the loadings and the original var values to construct an new 
 (artificial) var: (B1 * X1) + (B2 * X2) + (B3 * X3) = ArtVar (brackets 
 for readability). Use ArtVar for further analysis of the data, that

  is, as predictor etc.


For 3 variables, there is only one factor 
possible, so rotation is not a problem. (For 1 
factor, there are 3 unknown factor loadings and 3 
known correlations.  The model is just 
identified. )




 
 In my (I realise, elementary) psychological statistics readings I was 
 taught to use pca for these problems. Referring to Venables  Ripley 
 (2002, chapter 11), I applied princomp to my vars. But the outcome 
 shows 4 components -- which is obviously not what I want. Reading 
 further I found factanal, which produces loadings on the one 
 specified factor very fine. But since this is a contradiction to 
 theoretical introductions in so many texts I'm completely confused 
 whether I'm right with these calculations.


If you want to think of what these variables have 
in common, use factor analysis, if you want to 
summarize them all most efficiently with one 
composite, use principal components.  These are 
very different models.


As Mark said, the difference is that FA accounts 
for the covariances (the off diagonal elements) 
which reflect what the variables have in common. 
PCS accounts for the entire matrix, which in a 3 
x3 problem, is primarily the diagonal variances.


Let me know if you need more information.

Bill



 
 (1) Is there an easy example, which explains the differences between 
 pca and pfa? (2) Which R procedure should I use to get what I want?






 

 Thank you for your help

 Sören


 Refs.:

 Venables, W. N., and Ripley, B. D. (2002). Modern applied statistics 
 with S (4th edition). New York: Springer.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
View this message in context: 
http://www.nabble.com/pca-vs.-pfa%3A-dimension-reduction-tp22707926p22709481.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
William Revelle