Looks like you have some numerical precision issues. Why not use the svd function directly? (See below.)
-tgs x <- read.table( textConnection( "Sample1 0.7329881 0.76912670 2.45906143 -0.06411602 1.2427801 0.3785717 2.34508664 1.1043552 -0.1883830 0.6503095 Sample2 -2.0446131 1.72783245 -0.40965941 -0.21713655 -0.2386781 -0.1944390 -0.25181541 0.8882743 0.8404783 0.2531209 Sample3 1.7575174 0.03851425 -0.87537424 1.82811160 0.8342636 -0.8155942 -0.90893068 -0.5529098 -1.6586626 0.1761717 Sample4 0.2731749 0.24045167 -0.39913821 -0.48525909 0.1448994 -2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545 Sample5 -0.2014241 -1.04646151 0.28101160 0.74348390 0.1738312 -0.8431262 -0.08842512 1.2909658 1.2013136 0.6706926 Sample6 0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946 1.4525891 -0.39068509 -0.3373913 -0.0533732 0.9658389 Sample7 -0.8533191 -0.34438091 -1.23890437 -0.77360636 0.5926479 0.7742632 -1.12515017 -0.5720099 0.2243808 0.5420693 Sample8 0.4176988 -0.35906123 -0.07190644 0.90045123 -1.0621902 0.2693762 -0.38033715 0.6267548 0.4767652 0.3012347 Sample9 0.1088066 -0.32197951 0.46665158 -1.72560781 0.7375796 0.2794331 1.00171777 -0.1087306 1.2519195 -0.8848459 Sample10 -0.3651829 1.00253167 -0.12004007 0.34972942 -2.1310389 0.7162622 -0.19072440 -2.0173607 -0.9407956 -2.4224372"), header=F, as.is=TRUE) X<-as.matrix(x[,-1]) V <- t( eigen( t(X) %*% X , T )$vectors ) U <- eigen( X %*% t( X ) , T )$vectors D <- diag( sqrt( eigen( X %*% t( X ) , T )$values ) ) X - U %*% D %*% t(V) svd(X)$v - V svd(X)$u - U svd(X)$d - sqrt( eigen( X %*% t( X ) , T )$values ) X-svd(X)$u %*% diag( svd(X)$d ) %*% t( svd(X)$v ) On Fri, May 21, 2010 at 2:25 PM, Julia El-Sayed Moustafa < jelsa...@imperial.ac.uk> wrote: > > Hi all, > > As a molecular biologist by training, I'm fairly new to R (and > statistics!), > and was hoping for some advice. First of all, I'd like to apologise if my > question is more methodological rather than relating to a specific R > function. I've done my best to search both in the forum and elsewhere but > can't seem to find an answer which works in practice. > > I am carrying out principal component analysis in R using the Eigen > function, and have applied this to my dataset and can identify patterns in > my data which result from batch effects. What I'd ultimately like to do is > use PCA as a noise filter to remove the batch effects from my data, and the > way I would like to try and do this is by reconstructing the data, minus > the > components (or that part of them) which correlate with batch. > > As I have an extremely large dataset, as a first step I am attempting > simply > to apply the Eigen function to a dummy dataset consisting of only 10 > individuals and ten variables derived from my originial data, then > reconstruct the original dummy data matrix, just to see if I can get the > basics of the method to work first. Here is the dummy matrix (adjusted by > variable mean-subtraction and division by the variable standard deviation > applied to each data point) I am using: > > adjusted_dummyset > [,1] [,2] [,3] [,4] [,5] > [,6] [,7] [,8] [,9] [,10] > Sample1 0.7329881 0.76912670 2.45906143 -0.06411602 1.2427801 > 0.3785717 2.34508664 1.1043552 -0.1883830 0.6503095 > Sample2 -2.0446131 1.72783245 -0.40965941 -0.21713655 -0.2386781 > -0.1944390 -0.25181541 0.8882743 0.8404783 0.2531209 > Sample3 1.7575174 0.03851425 -0.87537424 1.82811160 0.8342636 > -0.8155942 -0.90893068 -0.5529098 -1.6586626 0.1761717 > Sample4 0.2731749 0.24045167 -0.39913821 -0.48525909 0.1448994 > -2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545 > Sample5 -0.2014241 -1.04646151 0.28101160 0.74348390 0.1738312 > -0.8431262 -0.08842512 1.2909658 1.2013136 0.6706926 > Sample6 0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946 > 1.4525891 -0.39068509 -0.3373913 -0.0533732 0.9658389 > Sample7 -0.8533191 -0.34438091 -1.23890437 -0.77360636 0.5926479 > 0.7742632 -1.12515017 -0.5720099 0.2243808 0.5420693 > Sample8 0.4176988 -0.35906123 -0.07190644 0.90045123 -1.0621902 > 0.2693762 -0.38033715 0.6267548 0.4767652 0.3012347 > Sample9 0.1088066 -0.32197951 0.46665158 -1.72560781 0.7375796 > 0.2794331 1.00171777 -0.1087306 1.2519195 -0.8848459 > Sample10 -0.3651829 1.00253167 -0.12004007 0.34972942 -2.1310389 > 0.7162622 -0.19072440 -2.0173607 -0.9407956 -2.4224372 > > I first multiply the matrix by the transposed matrix: > > > xxt=adjusted_dummyset %*% t(adjusted_dummyset) > > xxt > Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 > Sample7 Sample8 Sample9 Sample10 > Sample1 16.045163 -1.1367278 -2.5390033 -1.4762231 1.0158736 -1.8408484 > -5.8176751 -1.5163240 3.5304834 -6.2647180 > Sample2 -1.136728 9.0985066 -5.2175085 -0.8332460 0.7980575 -3.3607995 > 1.6342076 -0.3098849 -0.3462389 -0.3263661 > Sample3 -2.539003 -5.2175085 12.4738749 3.7747189 -0.9562913 -1.3252885 > -0.9175020 0.5849437 -6.0796087 0.2016647 > Sample4 -1.476223 -0.8332460 3.7747189 6.1161097 -1.0232149 -3.0984138 > -1.1213965 -1.9014901 -1.0503243 0.6134802 > Sample5 1.015874 0.7980575 -0.9562913 -1.0232149 6.0758629 0.2183710 > -0.9466710 2.1466445 -0.2626433 -7.0659890 > Sample6 -1.840848 -3.3607995 -1.3252885 -3.0984138 0.2183710 6.6590617 > 3.0772441 1.0977948 0.3980588 -1.8251802 > Sample7 -5.817675 1.6342076 -0.9175020 -1.1213965 -0.9466710 3.0772441 > 5.8681617 -0.9215294 0.1646922 -1.0195316 > Sample8 -1.516324 -0.3098849 0.5849437 -1.9014901 2.1466445 1.0977948 > -0.9215294 3.1757171 -2.2533118 -0.1025599 > Sample9 3.530483 -0.3462389 -6.0796087 -1.0503243 -0.2626433 0.3980588 > 0.1646922 -2.2533118 7.2986178 -1.3997251 > Sample10 -6.264718 -0.3263661 0.2016647 0.6134802 -7.0659890 -1.8251802 > -1.0195316 -0.1025599 -1.3997251 17.1889249 > > Then I apply the Eigen function and save the eigenvectors and eigenvalues > eigen(xxt,TRUE) > $values > [1] 2.693535e+01 1.898845e+01 1.668814e+01 1.184849e+01 8.136377e+00 > 4.893302e+00 2.007293e+00 4.994285e-01 3.173378e-03 -1.652399e-15 > > $vectors > [,1] [,2] [,3] [,4] [,5] > [,6] [,7] [,8] [,9] [,10] > [1,] 0.60510924 0.13494101 -0.54092789 -0.11000076 0.11721749 > 0.37004560 -0.106993925 -0.10503855 0.19436075 -0.3162278 > [2,] 0.05581480 -0.39453663 0.09160465 0.67089969 0.08088382 > 0.32755098 -0.009124933 0.11721316 -0.39379401 -0.3162278 > [3,] -0.28666669 0.71121151 -0.01000162 0.03014755 -0.06261445 > 0.22739798 0.422156670 0.02330019 -0.27677054 -0.3162278 > [4,] -0.13469379 0.23709870 -0.15611654 0.27760975 -0.50201617 > -0.31821931 -0.584748137 0.12405010 0.11661762 -0.3162278 > [5,] 0.26143053 0.12386029 0.29769330 0.19250271 0.27617639 > -0.49754797 0.084826842 -0.59908097 0.02670441 -0.3162278 > [6,] 0.01918933 -0.04796651 0.35560411 -0.57607725 0.07402744 > 0.14338749 -0.455866694 -0.01271614 -0.45276432 -0.3162278 > [7,] -0.11887886 -0.18220751 0.42748177 -0.07360955 -0.34327294 > 0.38589551 0.142875991 -0.16600492 0.59142740 -0.3162278 > [8,] -0.02725610 0.06425762 0.15301980 -0.01332073 0.53079711 > -0.18890512 0.012099700 0.65880817 0.34630946 -0.3162278 > [9,] 0.24706215 -0.31886828 -0.13294196 -0.25401667 -0.42308063 > -0.38135353 0.482282448 0.24225089 -0.19843310 -0.3162278 > [10,] -0.62111060 -0.32779019 -0.48541562 -0.14413474 0.25188193 > -0.06825162 0.012492037 -0.28278192 0.04634233 -0.3162278 > > I also calculate: > > xtx=t(adjusted_dummyset) %*% adjusted_dummyset > > xtx > [,1] [,2] [,3] [,4] [,5] [,6] > [,7] [,8] [,9] [,10] > [1,] 9.0000000 -3.1796310 2.0417037 3.9514142 2.7275520 -1.66571377 > 1.5629736 -0.9104748 -4.8506341 0.68480373 > [2,] -3.1796310 9.0000000 1.0981249 0.5494749 -1.2662064 -1.89318349 > 2.1005566 -0.5052055 -1.7947356 -3.90496189 > [3,] 2.0417037 1.0981249 9.0000000 -1.1689542 2.3836861 1.22540724 > 8.5924385 4.2130304 1.8322104 0.72643120 > [4,] 3.9514142 0.5494749 -1.1689542 9.0000000 -1.7132579 -2.51679155 > -2.8679239 0.5181511 -3.9536139 0.84095707 > [5,] 2.7275520 -1.2662064 2.3836861 -1.7132579 9.0000000 -2.17714376 > 3.1966738 4.1903147 0.7937017 5.20170437 > [6,] -1.6657138 -1.8931835 1.2254072 -2.5167916 -2.1771438 9.00000000 > 0.3764605 -1.9821441 2.3330854 -0.08204933 > [7,] 1.5629736 2.1005566 8.5924385 -2.8679239 3.1966738 0.37646053 > 9.0000000 3.5708620 1.7809104 -0.28160124 > [8,] -0.9104748 -0.5052055 4.2130304 0.5181511 4.1903147 -1.98214408 > 3.5708620 9.0000000 5.3281685 6.32863320 > [9,] -4.8506341 -1.7947356 1.8322104 -3.9536139 0.7937017 2.33308543 > 1.7809104 5.3281685 9.0000000 2.27959505 > [10,] 0.6848037 -3.9049619 0.7264312 0.8409571 5.2017044 -0.08204933 > -0.2816012 6.3286332 2.2795951 9.00000000 > > > and apply the eigen function to this: > > > eigen(xtx,TRUE) > $values > [1] 2.693535e+01 1.898845e+01 1.668814e+01 1.184849e+01 8.136377e+00 > 4.893302e+00 2.007293e+00 4.994285e-01 3.173378e-03 6.009067e-16 > > $vectors > [,1] [,2] [,3] [,4] [,5] > [,6] [,7] [,8] [,9] [,10] > [1,] -0.01604003 0.56323526 -0.190946669 -0.40105857 -0.002543387 > 0.17417910 0.25192284 -0.59870554 0.18090811 -0.01463338 > [2,] 0.08466679 -0.16695878 -0.455330886 0.54953906 -0.002740213 > -0.47379253 0.19091718 -0.36818489 0.23997046 0.03943000 > [3,] -0.42016626 -0.01672674 -0.438168969 -0.16986221 -0.259624942 > 0.03774746 -0.20059192 0.33694834 0.50122476 -0.35847979 > [4,] 0.17380688 0.46248616 -0.009158497 0.19699882 -0.641525880 > -0.08643216 0.32002024 0.38196270 -0.08822512 0.20466364 > [5,] -0.38230998 0.27820625 0.061021059 0.01793892 0.556873562 > -0.31199982 0.48447946 0.33729239 -0.05542533 -0.11569124 > [6,] -0.01078626 -0.35623684 0.086516360 -0.57830505 -0.317806785 > -0.56648878 0.22620425 -0.09749859 -0.18851088 -0.11373558 > [7,] -0.41357826 -0.06924232 -0.495909637 -0.11281244 0.008420391 > 0.07292844 -0.09886884 0.01075021 -0.44335225 0.59469674 > [8,] -0.48705511 0.07929178 0.158922061 0.33799785 -0.279059362 > 0.07881225 -0.04227138 -0.30875079 -0.45445853 -0.47881039 > [9,] -0.33349167 -0.40387477 0.287225004 0.07874711 -0.163416914 > 0.37364320 0.52438989 -0.09014761 0.33696991 0.27201980 > [10,] -0.34649980 0.24943468 0.446371805 0.03960183 -0.072567359 > -0.40852699 -0.43011632 -0.14042978 0.30891048 0.38025985 > > > I create a diagonal matrix containing the square roots of the eigenvalues > along the diagonal: > > > diagonalmatrix_eigenvaluesqrts > V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 > 1 5.61 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0000 > 2 0.00 1.41 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0000 > 3 0.00 0.00 1.12 0.000 0.000 0.000 0.000 0.000 0.000 0.0000 > 4 0.00 0.00 0.00 0.941 0.000 0.000 0.000 0.000 0.000 0.0000 > 5 0.00 0.00 0.00 0.000 0.727 0.000 0.000 0.000 0.000 0.0000 > 6 0.00 0.00 0.00 0.000 0.000 0.471 0.000 0.000 0.000 0.0000 > 7 0.00 0.00 0.00 0.000 0.000 0.000 0.296 0.000 0.000 0.0000 > 8 0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.246 0.000 0.0000 > 9 0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.159 0.0000 > 10 0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0127 > > I have been operating under the assumption that the original data matrix > could be reconstructed by multiplying the eigenvectors from xxt X a > diagonal > matrix with the square roots of the eigenvalues along the diagonal X the > eigenvectors from xtx as follows: > > Reconstructed_dummyset=eigenvectors_xxt %*% > as.matrix(diagonalmatrix_eigenvaluesqrts) %*% t(eigenvectors_xtx) > > Reconstructed_dummyset > [,1] [,2] [,3] [,4] [,5] > [,6] [,7] [,8] [,9] [,10] > [1,] 0.253193770 0.402535986 -1.14743281 0.5698820 -1.31593686 > -0.23278020 -1.10482209 -1.78241789 -1.344859283 -1.4554367 > [2,] -0.593623956 0.320035080 -0.30357878 -0.1156343 -0.25992539 > -0.25866729 -0.17538542 0.05041073 0.221261845 -0.2675080 > [3,] 0.620290857 -0.322491313 0.63113668 0.2551444 0.90025477 > -0.36701269 0.61029139 0.90661927 0.226875492 0.6933749 > [4,] 0.045394154 0.135401863 0.48897212 0.2772398 0.14530592 > -0.12141324 0.33943264 0.53647586 0.002607645 0.4395889 > [5,] -0.007416873 0.213143606 -0.90810973 0.2103079 -0.34091303 > -0.06529543 -0.82817426 -0.61633082 -0.543192842 -0.2177753 > [6,] 0.068080740 -0.533029067 -0.14658587 -0.2022178 -0.09846926 > 0.29932974 -0.12798021 -0.14611192 -0.016267532 0.1053571 > [7,] -0.113178997 -0.309552826 0.18447270 -0.1195559 0.01775153 > 0.15282100 0.02644529 0.41110216 0.623380590 0.3368958 > [8,] -0.076800910 -0.117830736 -0.03085105 -0.1712106 0.39036065 > -0.10570034 -0.05526268 -0.08333122 -0.028936234 0.1511139 > [9,] -0.187036903 0.212383109 -0.41884937 0.2704928 -0.69211148 > 0.50295414 -0.45885438 -0.75166730 -0.301317053 -0.6536756 > [10,] -0.008314191 -0.002179248 1.66522298 -0.9826679 1.25832941 > 0.20033204 1.75042615 1.49448058 1.149522805 0.8527934 > > However as you can see I've clearly got my wires crossed somewhere! > > I'd really appreciate any advice you could offer... Thanks! > > Julia > -- > View this message in context: > http://r.789695.n4.nabble.com/Data-reconstruction-following-PCA-using-Eigen-function-tp2226535p2226535.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.