1. I didn't see in your original email that you wanted V to be orthogonal, only that it's columns have length 1. You have a solution satisfying the latter constraint, but not the former.
2. I don't have time now to sort out the details, and I don't have them on the top of my head. I just entered "lda" into R 1.6.2 [after library(MASS)] and got the following:
> lda function (x, ...) { if (is.null(class(x))) class(x) <- data.class(x) UseMethod("lda", x, ...) }
To decode 'UseMethod("lda", ...)', I requested 'methods("lda")' with the following result:
> methods("lda") [1] "lda.data.frame" "lda.default" "lda.formula" "lda.matrix"
Have you tried listing each of these 4 functions and working through them step by step? I think this should answer your question. Also see Venables and Ripley (2002) Modern Applied Statistics with S, index entry for "lda".
hth. spencer graves
Christoph Lehmann wrote: > thanks a lot, Spencer > > The problem is the following: my textbook has an example with the data: > > X> x > x1 x2 x3 > 1 3 3 4 > 2 4 4 3 > 3 4 4 6 > 4 2 5 5 > 5 2 4 5 > 6 3 4 6 > 7 3 4 4 > 8 2 5 5 > 9 4 3 6 > 10 5 5 6 > 11 4 5 7 > 12 4 6 4 > 13 3 6 6 > 14 4 7 6 > 15 6 5 6 > -- > >>y > > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 > -- > >>Dtot <- (t(x)%*%x-t(xbar)%*%xbar) >>Dtot > > > x1 x2 x3 > x1 17.733333 2.666667 4.866667 > x2 2.666667 17.333333 4.333333 > x3 4.866667 4.333333 16.933333 > -- > >>A <- cbind(tapply(x[,1],y,sum), tapply(x[,2],y,sum), > > tapply(x[,3],y,sum)) > >>A > > [,1] [,2] [,3] > 1 18 24 29 > 2 14 17 21 > 3 21 29 29 > >>G <- apply(x,2,sum) >>G > > x1 x2 x3 > 53 70 79 > >>p <- ncol(x) >>k <- length(freq) >>N <- sum(freq) >>Dtreat <- array(0,c(p,p)) >>k <- length(freq) >>for (i in 1:p) > > + { > + for (j in 1:k) > + { > + for (h in 1:k) > + { > + Dtreat[i,j] <- Dtreat[i,j] + A[h,i]*A[h,j]/freq[h] > + } > + Dtreat[i,j] <- Dtreat[i,j] - G[i]*G[j]/N > + } > + } > >>Dtreat > > [,1] [,2] [,3] > [1,] 3.933333 5.966667 3.166667 > [2,] 5.966667 9.783333 4.783333 > [3,] 3.166667 4.783333 2.550000 > -- > >>Derror <- Dtot-Dtreat >>Derror > > > x1 x2 x3 > x1 13.8 -3.30 1.70000 > x2 -3.3 7.55 -0.45000 > x3 1.7 -0.45 14.38333 > > -- > >>eigen(Dtreat%*%solve(Derror)) > > $values > [1] 2.300398e+00 2.039672e-02 -1.907034e-15 > > $vectors > [,1] [,2] [,3] > [1,] -0.4870772 0.6813155 -0.6076020 > [2,] -0.7809602 -0.4342229 0.1539928 > [3,] -0.3909693 0.5892874 0.7791701 > > >>V <- eigen(Dtreat%*%solve(Derror))$vectors >>V > > [,1] [,2] [,3] > [1,] -0.4870772 0.6813155 -0.6076020 > [2,] -0.7809602 -0.4342229 0.1539928 > [3,] -0.3909693 0.5892874 0.7791701 > > the textbook (SPSS) has similar eigenvalues, but only two!: > > lambda1 = 2.30048, lambda2 = 0.02091 > , but as I wrote in the last mail: different eigenvectors > > Let's start here with your recommendation: > first, it seems, since the last eigenvalue is almost 0, that the > eigenvectors V are not orthogonal: > > >>t(V)%*%V > > [,1] [,2] [,3] > [1,] 1.0000000 -0.22313575 -0.12894473 > [2,] -0.2231357 1.00000000 -0.02168078 > [3,] -0.1289447 -0.02168078 1.00000000 > > let's continue anyway? > >>D.5 <- chol(Derror) >>t(D.5) %*% D.5 > > > x1 x2 x3 > x1 13.8 -3.30 1.70000 > x2 -3.3 7.55 -0.45000 > x3 1.7 -0.45 14.38333 > >>Dm.5 <- solve(D.5) >>t(Dm.5) %*% Derror %*% Dm.5 > > > x1 x2 x3 > x1 1.000000e+00 -2.523481e-17 -1.097755e-18 > x2 -6.625163e-18 1.000000e+00 -2.120970e-18 > x3 4.501901e-18 4.460942e-19 1.000000e+00 > perfectly orthogonal > >>t(V)%*%t(Dm.5)%*%Dfehler%*%Dm.5%*%V > > > [,1] [,2] [,3] > [1,] 1.0000000 -0.22313575 -0.12894473 > [2,] -0.2231357 1.00000000 -0.02168078 > [3,] -0.1289447 -0.02168078 1.00000000 > again, equals t(V)%*%V not orthogonal. > > -- I think it has to do with the fact, that the textbook considers the > third eigenvalue as = 0 and then gets the Vstar eigenvectors (which I > try to reproduce: > > Vstar = > [,1] [,2] [,3] > [1,] 0.1689 0.1419 -0.1825 > [2,] 0.3498 -0.1597 0.0060 > [3,] 0.0625 0.1422 0.2154 > > - > > Spencer if you find some minutes time to help me reproduce this example, > it would be very nice (the data are from Jones 1961. He investigated > whether essays written by children from lower, middle, upper class > differ in sentence length, choosen words, complexity of sentence) > > Cheers > > Christoph > ########################################## The following satisfies some of your constraints but I don't know if it satisfies all of them.
Let V = eigenvectors normalized so t(V) %*% V = I. Also, let D.5 = some square root matrix, so t(D.5) %*% D.5 = Derror, and Dm.5 = solve(D.5) = invers of D.5. The Choleski decomposition ("chol") provides one such solution, but you can construct a symmetric square root using "eigen". Then Vstar = Dm.5%*%V will have the property you mentioned below.
Consider the following:
> (Derror <- array(c(1,1,1,4), dim=c(2,2))) [,1] [,2] [1,] 1 1 [2,] 1 4 > D.5 <- chol(Derror) > t(D.5) %*% D.5 [,1] [,2] [1,] 1 1 [2,] 1 4 > (Dm.5 <- solve(D.5)) [,1] [,2] [1,] 1 -0.5773503 [2,] 0 0.5773503 > (t(Dm.5) %*% Derror %*% Dm.5) [,1] [,2] [1,] 1 0 [2,] 0 1
Thus,t(Vstar)%*%Derror%*%Vstar = t(V)%*%t(Dm.5)%*%Derror%*%Dm.5%*%V = t(V)%*%V = I.
hope this helps. spencer graves
Christoph Lehmann wrote:
> Hi dear R-users
>
> I try to reproduce the steps included in a LDA. Concerning the eigenvectors there is
> a difference to SPSS. In my textbook (Bortz)
> it says, that the matrix with the eigenvectors
>
> V
>
> usually are not normalized to the length of 1, but in the way that the
> following holds (SPSS does the same thing):
>
> t(Vstar)%*%Derror%*%Vstar = I
>
>
> where Vstar are the normalized eigenvectors. Derror is an "error" or
> "within" squaresum- and crossproduct matrix (squaresum of the p
> variables on the diagonale, and the non-diagonal elements are the sum of
> the crossproducts). For Derror the following holds: Dtotal = Dtreat +
> Derror.
>
> Since I assume that many of you are familiar with this transformation:
> can anybody of you tell me, how to conduct this transformation in R?
> Would be very nice. Thanks a lot
>
> Cheers
>
> Christoph
>
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help