Hi Sagndon,
here a detailed protocol, how to proceed.
Part 1 is for reproducing your varimax-factors, and the PC-regression-weights.
Part 2 is to reproduce the regression weigths for X-variables from there.
As I said, it is just a matrixmultiplication.
In short:
if LXY is your varimax-rotated loadingsmatrix, LX (4x4) only the part of
the X-loadings in it,and LY (1x4) only the part of Y-loadings then
LXY = [LX,LY] = varimax(cholesky(CORR))
then the beta-weights for Y expressed in terms of X is
BETA = LY * inv(LX)
Hope it helps... :-)
Regards-
Gottfried Helms
*********************** USe fixed font ****************************************
== Part 1
; MatMate-Listing vom:14.03.03 21:46:52
;=================================================================================================
;-------------- Getting data --------------------------------------------
;=================================================================================================
[0] set listing=on ccdezweite=3 ccfeldweite=7
; get your data as matrix from file:
[1] A = CSVDATEI("F:\TEMP\DATA.CSV")'
// stored as: y x1..x4 fs1..fs4 ; transposed to have the variables alonng rows
[2] n = columns(A) // = 34 cases
[3] Y = subzl(A,1) // splitting the rows of data in var-groups
[4] X = subzl(A,2..5)
[5] OFS = subzl(A,6..9) // I use the O-riginal F-actor S-cores for checking
[6] Z = {x,y,ofs} // reorganizing data
[7] Z = Z/:stddevzl(Z) // matmate uses s�=sum(dev�)/N for variance, so I have
to normalize
// the minitab-data
;=================================================================================================
; Correlation/Factorization
;=================================================================================================
[8] COR = Z*Z'/n // reproducing correlation matrix
[9] L = cholesky(COR) // factorizing cor (cholesky-triangular shape)
[10] L1 = subsp(L1,1..5) // cholesky produces only 5 factors: use only 5 columns
[11] disp = l1 || null(9,1) || subsp(z,1..5)
disp:
Factor-loadings ! Scores ... (only first 5
cases shown)
------------------------------------------------------------------------------------------------------
X: 4 variables
| 1.000 . . . . ! -1.342 -0.550 -0.677
0.401 0.052 ...|
| 0.005 1.000 . . . ! -1.102 0.554 -0.862
0.234 0.029 ...|
|-0.010 -0.317 0.948 . . ! 0.518 -0.426 -1.534
-1.406 -0.828 ...|
|-0.405 0.514 -0.300 0.694 . ! -1.083 0.091 1.856
0.869 -0.058 ...|
Y: 1 variable
|-0.407 0.594 -0.303 0.172 0.600 ! -0.030 1.188 0.863
0.508 -0.406 ...|
FS : 4 variables
| 0.023 0.962 0.175 -0.210 . ! -0.747 0.627 -1.615
-0.162 -0.029 ...|
|-0.981 -0.020 0.007 -0.193 . ! 1.622 0.665 0.266
-0.544 0.029 ...|
| 0.023 0.141 -0.976 -0.164 . ! -0.107 0.447 1.386
1.293 0.921 ...|
| 0.191 -0.235 0.129 -0.944 . ! 1.401 0.408 -2.147
-0.827 0.314 ...|
Left hand is the initial L1 loadingsmatrix. It has to be rotated
for PCA resp Varimaxposition.
The valid criterion for PCA and Varimax are only the x-variables,
so only rows 1..4 are selected for the criterion, and for the collecting
phase of PC-rotation all columns 1..5 are choosen:
[12] l1 = rot(l1,"pca", 1..4, 1..5) // PC-rotation to collect loadings in first
4 factors
[13] l1 = rot(l1,"varimax", 1..4, 1..4) // varimaxrotation of a part of the
loadingsmatrix
[14] l1 = l1*:{-1,-1,-1,-1,1} // adapting signs of result to the reference
solution
Varimax-loadings; here including also original-factor-scores as reference
[15] disp = l1||null(9,1)||subsp(z,1..5) // in l1 is now the PC-regression-solution
Loadings ! Scores ... (only first 5
cases shown)
------------------------------------------------------------------------------------------------------
X: 4 variables
| 0.022 -0.981 0.023 0.191 . . -1.342 -0.550 -0.677
0.401 0.052 |
| 0.962 -0.025 0.142 -0.234 . . -1.102 0.554 -0.862
0.234 0.029 |
|-0.139 0.023 -0.971 0.195 . . 0.518 -0.426 -1.534
-1.406 -0.828 |
| 0.286 0.251 0.243 -0.892 . . -1.083 0.091 1.856
0.869 -0.058 |
Y: 1 variable
| 0.473 0.352 0.343 -0.419 0.600 . -0.030 1.188 0.863
0.508 -0.406 |
FS : 4 variables
| 1.000 . . . . . -0.747 0.627 -1.615
-0.162 -0.029 |
| . 1.000 . . . . 1.622 0.665 0.266
-0.544 0.029 |
| . . 1.000 . . . -0.107 0.447 1.386
1.293 0.921 |
| . . . 1.000 . . 1.401 0.408 -2.147
-0.827 0.314 |
------------------------------------------------------------------------------------------------------
>From the lower identity-blockmatrix you see, that the factorsolution was correctly
reproduced.
The loadings of Y on these four factors are already the PC-Regression-weights:
The reference you have given:
; ## Regression Analysis: Ys versus fs1, fs2, fs3, fs4
; The regression equation is
; Ys = 0.000 + 0.472 fs1 + 0.352 fs2 + 0.343 fs3 - 0.419 fs4
;
; Predictor Coef SE Coef T P VIF
; Constant 0.0000 0.1097 0.00 1.000
; fs1 0.4724 0.1114 4.24 0.000 1.0
; fs2 0.3523 0.1114 3.16 0.004 1.0
; fs3 0.3426 0.1114 3.08 0.005 1.0
; fs4 -0.4190 0.1114 -3.76 0.001 1.0
;
You find these Coef in the Y-row above.
Note, that with this method also a residual-factor for Y was computed, with
a loading of 0.6 of Y in the 5th column.
=================================================================================================
For the computation of factor-scores you need the inverse
[16] lxy = subzl(l1,1..5) // extracting the rows of xy-part for
inversion
[17] lxyi = inv(lxy) // invert the loadingsmatrix of xy
[18] disp = lxyi'
disp :
|-0.105 -1.095 -0.084 -0.365 0.519 |
| 1.146 0.097 -0.087 0.371 -0.652 |
| 0.089 -0.081 -1.104 -0.295 0.403 |
|-0.303 -0.278 -0.236 -1.361 -0.412 |
| . . . . 1.668 |
;your reference-solution ----------------
;t Factor Score Coefficients (FSC)
;t Variable Factor1 Factor2 Factor3 Factor4
;t X1s -0.105 -1.095 -0.084 -0.365
;t X2s 1.146 0.097 -0.087 0.371
;t X3s 0.089 -0.081 -1.104 -0.294
;t X4s -0.303 -0.278 -0.237 -1.361
; Now compute factor-scores. To compute 5 factors (the last is
the residual for Y) from Z the first 5 rows have to be taken:
[19] fsc = inv(lxy)*subzl(Z,1..5) // computing factor-scores
Displaying factor-scores. The scheme represents the equation
L1 * FSC = [X,Y,OFS] = [L1_X,L1_Y,L1_OFS] * FSC
or in 2-dimensional way
FSC
*----------
[L1_X] [ X ]
[L1_Y] [ Y ]
[L1_O] [ OFS]
[21] disp = { null(5,6) || subsp(FSC,1..5) , _ // factorscores, 5 rows, and
the first 5 cases
l1 || null(9,1) || subsp(Z ,1..5)} // loadings and data-scores
! Scores ... (only first 5
cases shown)
------------------------------------------------------------------------------------------------------
computed factors are identical with the reference-factors, see
block below
. . . . . ! -0.747 0.627 -1.615
-0.162 -0.029
. . . . . ! 1.622 0.665 0.266
-0.544 0.029
. . . . . ! -0.107 0.447 1.386
1.293 0.921
. . . . . ! 1.401 0.408 -2.147
-0.827 0.314
residual-factor for Y
. . . . . ! 0.625 1.126 0.267
-0.022 -0.979
Factor-loadings
-------------------------------------------------
X: 4 variables
| 0.022 -0.981 0.023 0.191 . . -1.342 -0.550 -0.677
0.401 0.052
| 0.962 -0.025 0.142 -0.234 . . -1.102 0.554 -0.862
0.234 0.029
|-0.139 0.023 -0.971 0.195 . . 0.518 -0.426 -1.534
-1.406 -0.828
| 0.286 0.251 0.243 -0.892 . . -1.083 0.091 1.856
0.869 -0.058
Y: 1 variable
| 0.473 0.352 0.343 -0.419 0.600 . -0.030 1.188 0.863
0.508 -0.406
FS : 4 variables
| 1.000 . . . . . -0.747 0.627 -1.615
-0.162 -0.029
| . 1.000 . . . . 1.622 0.665 0.266
-0.544 0.029
| . . 1.000 . . . -0.107 0.447 1.386
1.293 0.921
| . . . 1.000 . . 1.401 0.408 -2.147
-0.827 0.314
It shows, the computed factor-scores are identical to that, what MINITAB has computed.
Additionally
the residual-factor for Y was generated.
===============================================================================================================
== Part 2
Computing of the beta-weights in terms of X
===============================================================================================================
;-- now compute beta-weights, means loadings in terms of x1..x4
For inversion and recalculation now there is only use for the x-factors;
if you would include the residual, then the resulting blockmatrix for X and Y
would be an identity. So only the X-part of the loadingsmatrix L1 is used here
[22] lx = sub(l1, 1..4:1..4)
[23] l1x = subsp(l1,1..4)*inv(lx) // new loadingsmatrix, assuming factors were
identical
// with X-data
[24] disp = l1x
Factor-loadings ! Scores ... (only first 5
cases shown)
--------------------------------------------------------------------------------------------------
X: 4 variables
| 1.000 . . . |
| . 1.000 . . |
| . . 1.000 . |
| . . . 1.000 |
Y: 1 variable
|-0.311 0.391 -0.242 0.247 |
FS : 4 variables
|-0.105 1.146 0.089 -0.303 |
|-1.095 0.097 -0.081 -0.278 |
|-0.084 -0.087 -1.104 -0.237 |
|-0.365 0.371 -0.294 -1.361 |
-------------------------------------------------------------------------------------------------
The beta-weights for Y in terms of X are the loadings are the
entries in the y-row.
The upper identity-block indicates, that indeed the factors referred
for the Y-loadings are identical with the X-data.
The lower loadings-block indicates, how your given reference-factor-
scores are composed of X-data (it is just the inverted varimax-loadings-
matrix)
==========================================================================================================
In short:
if LXY is your varimax-rotated loadingsmatrix, LX (4x4) only the part of
the X-loadings in it,and LY (1x4) only the part of Y-loadings then
BETA = LY * inv(LX)
and
LXY = [LX,LY] = varimax(cholesky(CORR))
Hope it helps... :-)
Regards-
Gottfried Helms
[EMAIL PROTECTED] schrieb:
>
> Dear Gottgried Helms,
>
> Thanks for your response. I appreciate your help. I believe you understand
> my problem correctly. I've been trying to understand your two e-mails with
> difficulty. Maybe I'm confused with the terminologies: fs1 mean factor
> score 1, and so forth. FSC stands for factor score coefficients and please
> see the Minitab output below. I'm wondering whether you could elaborate
> one more time.
> Basically, I'm interested in the variables themselves rather than the
> principal components.
>
> The principal component regression is :
> Ys = 0.472*fs1 + 0.352*fs2 + 0.343*fs3 - 0.419*fs4
>
> I'm wondering how the above PCR equation can be re-expressed in terms of
> the variables themselves. For example, Ys = f(X1, X2, X3, X4).
>
> Thanks for your help.
>
> Sangdon Lee, Ph.D.,
> GM Tech. Center, MI, USA.
> [EMAIL PROTECTED]
>
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================