Re: QUERY: principal component regression to be expressed in original

Gottfried Helms Fri, 14 Mar 2003 14:06:37 -0800

Hi Sagndon, 

 here a detailed protocol, how to proceed.


 Part 1 is for reproducing your varimax-factors, and the PC-regression-weights.

 Part 2 is to reproduce the regression weigths for X-variables from there.
 As I said, it is just a matrixmultiplication.




In short:

  if LXY is your varimax-rotated loadingsmatrix, LX (4x4) only the part of
  the X-loadings in it,and LY (1x4) only the part of Y-loadings then 

  LXY = [LX,LY] = varimax(cholesky(CORR))

  then the beta-weights for Y expressed in terms of X is 

  BETA = LY * inv(LX) 

Hope it helps... :-)

Regards-

Gottfried Helms

*********************** USe fixed font ****************************************

== Part 1 

;  MatMate-Listing vom:14.03.03 21:46:52

;=================================================================================================
;-------------- Getting data --------------------------------------------
;=================================================================================================
[0] set listing=on ccdezweite=3 ccfeldweite=7

; get your data as matrix from file:
[1] A = CSVDATEI("F:\TEMP\DATA.CSV")'
  // stored as:  y   x1..x4    fs1..fs4 ; transposed to have the variables alonng rows


[2] n   = columns(A)          // = 34 cases
     
[3] Y   = subzl(A,1)          // splitting the rows of data in var-groups
[4] X   = subzl(A,2..5)
[5] OFS = subzl(A,6..9)       // I use the O-riginal F-actor S-cores for checking
     
[6] Z = {x,y,ofs}             // reorganizing data 
[7] Z = Z/:stddevzl(Z)        // matmate uses s�=sum(dev�)/N for variance, so I have 
to normalize
                              // the minitab-data
     
;=================================================================================================
;               Correlation/Factorization 
;=================================================================================================
     
[8] COR = Z*Z'/n              // reproducing correlation matrix
[9] L = cholesky(COR)         // factorizing cor (cholesky-triangular shape)
     
[10] L1 = subsp(L1,1..5)      // cholesky produces only 5 factors: use only 5 columns






[11] disp =  l1 || null(9,1)  ||  subsp(z,1..5) 

disp:
             Factor-loadings                       !     Scores ... (only first 5 
cases shown)
------------------------------------------------------------------------------------------------------
X: 4 variables
        | 1.000    .       .       .       .       !     -1.342  -0.550  -0.677   
0.401   0.052  ...|
        | 0.005   1.000    .       .       .       !     -1.102   0.554  -0.862   
0.234   0.029  ...|
        |-0.010  -0.317   0.948    .       .       !      0.518  -0.426  -1.534  
-1.406  -0.828  ...|
        |-0.405   0.514  -0.300   0.694    .       !     -1.083   0.091   1.856   
0.869  -0.058  ...|

Y: 1 variable  
        |-0.407   0.594  -0.303   0.172   0.600    !     -0.030   1.188   0.863   
0.508  -0.406  ...|

FS : 4 variables
        | 0.023   0.962   0.175  -0.210    .       !     -0.747   0.627  -1.615  
-0.162  -0.029  ...|
        |-0.981  -0.020   0.007  -0.193    .       !      1.622   0.665   0.266  
-0.544   0.029  ...|
        | 0.023   0.141  -0.976  -0.164    .       !     -0.107   0.447   1.386   
1.293   0.921  ...|
        | 0.191  -0.235   0.129  -0.944    .       !      1.401   0.408  -2.147  
-0.827   0.314  ...|


Left hand is the initial L1 loadingsmatrix. It has to be rotated
for PCA resp Varimaxposition. 








The valid criterion for PCA and Varimax are only the x-variables,
so only rows 1..4 are selected for the criterion, and for the collecting
phase of PC-rotation all columns 1..5 are choosen:

[12] l1 = rot(l1,"pca",     1..4, 1..5)   // PC-rotation to collect loadings in first 
4 factors
[13] l1 = rot(l1,"varimax", 1..4, 1..4)   // varimaxrotation of a part of the 
loadingsmatrix
[14] l1 = l1*:{-1,-1,-1,-1,1}             // adapting signs of result to the reference 
solution
     


Varimax-loadings; here including also original-factor-scores as reference
    
[15] disp = l1||null(9,1)||subsp(z,1..5)  // in l1 is now the PC-regression-solution


                Loadings                           !     Scores ... (only first 5 
cases shown)
------------------------------------------------------------------------------------------------------

X: 4 variables
        | 0.022  -0.981   0.023   0.191    .       .     -1.342  -0.550  -0.677   
0.401   0.052 |
        | 0.962  -0.025   0.142  -0.234    .       .     -1.102   0.554  -0.862   
0.234   0.029 |
        |-0.139   0.023  -0.971   0.195    .       .      0.518  -0.426  -1.534  
-1.406  -0.828 |
        | 0.286   0.251   0.243  -0.892    .       .     -1.083   0.091   1.856   
0.869  -0.058 |

Y: 1 variable   
        | 0.473   0.352   0.343  -0.419   0.600    .     -0.030   1.188   0.863   
0.508  -0.406 |

FS : 4 variables
        | 1.000    .       .       .       .       .     -0.747   0.627  -1.615  
-0.162  -0.029 |
        |  .      1.000    .       .       .       .      1.622   0.665   0.266  
-0.544   0.029 |
        |  .       .      1.000    .       .       .     -0.107   0.447   1.386   
1.293   0.921 |
        |  .       .       .      1.000    .       .      1.401   0.408  -2.147  
-0.827   0.314 |
------------------------------------------------------------------------------------------------------


>From the lower identity-blockmatrix you see, that the factorsolution was correctly
reproduced.
The loadings of Y on these four factors are already the PC-Regression-weights:




The reference you have given:     

     ; ## Regression Analysis: Ys versus fs1, fs2, fs3, fs4
     ; The regression equation is
     ; Ys = 0.000 + 0.472 fs1 + 0.352 fs2 + 0.343 fs3 - 0.419 fs4
     ; 
     ; Predictor        Coef     SE Coef          T        P       VIF
     ; Constant       0.0000      0.1097       0.00    1.000
     ; fs1            0.4724      0.1114       4.24    0.000       1.0
     ; fs2            0.3523      0.1114       3.16    0.004       1.0
     ; fs3            0.3426      0.1114       3.08    0.005       1.0
     ; fs4           -0.4190      0.1114      -3.76    0.001       1.0
     ; 

You find these Coef in the Y-row above.
Note, that with this method also a residual-factor for Y was computed, with
a loading of 0.6 of Y in the 5th column.

=================================================================================================


For the computation of factor-scores you need the inverse
    

[16] lxy = subzl(l1,1..5)                 // extracting the rows of xy-part for 
inversion
[17] lxyi = inv(lxy)                      // invert the loadingsmatrix of xy
   
     
[18] disp = lxyi'
      disp : 
        |-0.105  -1.095  -0.084  -0.365   0.519 |
        | 1.146   0.097  -0.087   0.371  -0.652 |
        | 0.089  -0.081  -1.104  -0.295   0.403 |
        |-0.303  -0.278  -0.236  -1.361  -0.412 |
        |  .       .       .       .      1.668 |
     

;your reference-solution ----------------
;t Factor Score Coefficients (FSC)
;t Variable    Factor1    Factor2    Factor3    Factor4
;t X1s          -0.105     -1.095     -0.084     -0.365
;t X2s           1.146      0.097     -0.087      0.371
;t X3s           0.089     -0.081     -1.104     -0.294
;t X4s          -0.303     -0.278     -0.237     -1.361

     
; Now compute factor-scores. To compute 5 factors (the last is
the residual for Y) from Z the first 5 rows have to be taken:
     
[19] fsc = inv(lxy)*subzl(Z,1..5) // computing factor-scores
     


Displaying factor-scores. The scheme represents the equation 

        L1 * FSC  =  [X,Y,OFS]     = [L1_X,L1_Y,L1_OFS] * FSC

or in 2-dimensional way
                       FSC
                   *----------
           [L1_X]    [ X  ]
           [L1_Y]    [ Y  ]
           [L1_O]    [ OFS]



[21] disp = { null(5,6)       || subsp(FSC,1..5) , _    // factorscores, 5 rows, and 
the first 5 cases
              l1 || null(9,1) || subsp(Z  ,1..5)}       // loadings and data-scores

                                                   !     Scores ... (only first 5 
cases shown)
------------------------------------------------------------------------------------------------------
                      computed factors are identical with the reference-factors, see 
block below
           .       .       .       .       .       !     -0.747   0.627  -1.615  
-0.162  -0.029 
           .       .       .       .       .       !      1.622   0.665   0.266  
-0.544   0.029 
           .       .       .       .       .       !     -0.107   0.447   1.386   
1.293   0.921 
           .       .       .       .       .       !      1.401   0.408  -2.147  
-0.827   0.314 
                      residual-factor for Y
           .       .       .       .       .       !      0.625   1.126   0.267  
-0.022  -0.979 
             Factor-loadings                       
-------------------------------------------------

X: 4 variables
        | 0.022  -0.981   0.023   0.191    .       .     -1.342  -0.550  -0.677   
0.401   0.052 
        | 0.962  -0.025   0.142  -0.234    .       .     -1.102   0.554  -0.862   
0.234   0.029 
        |-0.139   0.023  -0.971   0.195    .       .      0.518  -0.426  -1.534  
-1.406  -0.828 
        | 0.286   0.251   0.243  -0.892    .       .     -1.083   0.091   1.856   
0.869  -0.058 

Y: 1 variable   
        | 0.473   0.352   0.343  -0.419   0.600    .     -0.030   1.188   0.863   
0.508  -0.406 

FS : 4 variables
        | 1.000    .       .       .       .       .     -0.747   0.627  -1.615  
-0.162  -0.029 
        |  .      1.000    .       .       .       .      1.622   0.665   0.266  
-0.544   0.029 
        |  .       .      1.000    .       .       .     -0.107   0.447   1.386   
1.293   0.921 
        |  .       .       .      1.000    .       .      1.401   0.408  -2.147  
-0.827   0.314 
     

It shows, the computed factor-scores are identical to that, what MINITAB has computed. 
Additionally
the residual-factor for Y was generated.






===============================================================================================================

== Part 2

Computing of the beta-weights in terms of X 
===============================================================================================================

     ;-- now compute beta-weights, means loadings in terms of x1..x4

For inversion and recalculation now there is only use for the x-factors;
if you would include the residual, then the resulting blockmatrix for X and Y 
would be an identity. So only the X-part of the loadingsmatrix L1 is used here



[22]  lx =   sub(l1,  1..4:1..4)
[23] l1x = subsp(l1,1..4)*inv(lx)  // new loadingsmatrix, assuming factors were 
identical
                                   // with X-data
     
[24] disp = l1x

             Factor-loadings                       !     Scores ... (only first 5 
cases shown)
--------------------------------------------------------------------------------------------------
X: 4 variables
        | 1.000    .       .       .    |
        |  .      1.000    .       .    |
        |  .       .      1.000    .    |
        |  .       .       .      1.000 |

Y: 1 variable   
        |-0.311   0.391  -0.242   0.247 |

FS : 4 variables
        |-0.105   1.146   0.089  -0.303 |
        |-1.095   0.097  -0.081  -0.278 |
        |-0.084  -0.087  -1.104  -0.237 |
        |-0.365   0.371  -0.294  -1.361 |
-------------------------------------------------------------------------------------------------
     
The beta-weights for Y in terms of X are the loadings are the
entries in the y-row.
The upper identity-block indicates, that indeed the factors referred
for the Y-loadings are identical with the X-data.
The lower loadings-block indicates, how your given reference-factor-
scores are composed of X-data (it is just the inverted varimax-loadings-
matrix)
==========================================================================================================


In short:

  if LXY is your varimax-rotated loadingsmatrix, LX (4x4) only the part of
  the X-loadings in it,and LY (1x4) only the part of Y-loadings then 

  BETA = LY * inv(LX) 
  and 
  LXY = [LX,LY] = varimax(cholesky(CORR))

Hope it helps... :-)

Regards-

Gottfried Helms






[EMAIL PROTECTED] schrieb:
> 
> Dear Gottgried Helms,
> 
> Thanks for your response. I appreciate your help.  I believe you understand
> my problem correctly. I've been trying to understand your two e-mails with
> difficulty.  Maybe I'm confused with the terminologies: fs1 mean factor
> score 1, and so forth. FSC stands for factor score coefficients and please
> see the Minitab output below.  I'm wondering whether you could elaborate
> one more time.
> Basically, I'm interested in the variables themselves rather than the
> principal components.
> 
> The principal component regression is :
> Ys =  0.472*fs1 + 0.352*fs2 + 0.343*fs3 - 0.419*fs4
> 
> I'm wondering how the above PCR equation can be re-expressed in terms of
> the variables themselves.  For example, Ys = f(X1, X2, X3, X4).
> 
> Thanks for your help.
> 
> Sangdon Lee, Ph.D.,
> GM Tech. Center, MI, USA.
> [EMAIL PROTECTED]
>

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: QUERY: principal component regression to be expressed in original

Reply via email to