Re: [R] Data reconstruction following PCA using Eigen function

Thomas Stewart Sat, 22 May 2010 11:53:01 -0700

Looks like you have some numerical precision issues.  Why not use the svd
function directly?  (See below.)


-tgs

x <- read.table(
textConnection(
"Sample1   0.7329881  0.76912670  2.45906143 -0.06411602  1.2427801
0.3785717  2.34508664  1.1043552 -0.1883830  0.6503095
Sample2  -2.0446131  1.72783245 -0.40965941 -0.21713655 -0.2386781
-0.1944390 -0.25181541  0.8882743  0.8404783  0.2531209
Sample3   1.7575174  0.03851425 -0.87537424  1.82811160  0.8342636
-0.8155942 -0.90893068 -0.5529098 -1.6586626  0.1761717
Sample4   0.2731749  0.24045167 -0.39913821 -0.48525909  0.1448994
-2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545
Sample5  -0.2014241 -1.04646151  0.28101160  0.74348390  0.1738312
-0.8431262 -0.08842512  1.2909658  1.2013136  0.6706926
Sample6   0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946 1.4525891
-0.39068509 -0.3373913 -0.0533732  0.9658389
Sample7  -0.8533191 -0.34438091 -1.23890437 -0.77360636  0.5926479 0.7742632
-1.12515017 -0.5720099  0.2243808  0.5420693
Sample8   0.4176988 -0.35906123 -0.07190644  0.90045123 -1.0621902 0.2693762
-0.38033715  0.6267548  0.4767652  0.3012347
Sample9   0.1088066 -0.32197951  0.46665158 -1.72560781  0.7375796 0.2794331
 1.00171777 -0.1087306  1.2519195 -0.8848459
Sample10 -0.3651829  1.00253167 -0.12004007  0.34972942 -2.1310389 0.7162622
-0.19072440 -2.0173607 -0.9407956 -2.4224372"),
 header=F, as.is=TRUE)

X<-as.matrix(x[,-1])

V <- t( eigen( t(X) %*% X , T )$vectors )
U <- eigen( X %*% t( X ) , T )$vectors
D <- diag( sqrt( eigen( X %*% t( X ) , T )$values ) )

X - U %*% D %*% t(V)

svd(X)$v - V
svd(X)$u - U
svd(X)$d - sqrt( eigen( X %*% t( X ) , T )$values )

X-svd(X)$u %*% diag( svd(X)$d ) %*% t( svd(X)$v )






On Fri, May 21, 2010 at 2:25 PM, Julia El-Sayed Moustafa <
jelsa...@imperial.ac.uk> wrote:

>
> Hi all,
>
> As a molecular biologist by training, I'm fairly new to R (and
> statistics!),
> and was hoping for some advice. First of all, I'd like to apologise if my
> question is more methodological rather than relating to a specific R
> function. I've done my best to search both in the forum and elsewhere but
> can't seem to find an answer which works in practice.
>
> I am carrying out principal component analysis in R using the Eigen
> function, and have applied this to my dataset and can identify patterns in
> my data which result from batch effects. What I'd ultimately like to do is
> use PCA as a noise filter to remove the batch effects from my data, and the
> way I would like to try and do this is by reconstructing the data, minus
> the
> components (or that part of them) which correlate with batch.
>
> As I have an extremely large dataset, as a first step I am attempting
> simply
> to apply the Eigen function to a dummy dataset consisting of only 10
> individuals and ten variables derived from my originial data, then
> reconstruct the original dummy data matrix, just to see if I can get the
> basics of the method to work first. Here is the dummy matrix (adjusted by
> variable mean-subtraction and division by the variable standard deviation
> applied to each data point) I am using:
>
> adjusted_dummyset
>               [,1]        [,2]        [,3]        [,4]       [,5]
> [,6]        [,7]       [,8]       [,9]      [,10]
> Sample1   0.7329881  0.76912670  2.45906143 -0.06411602  1.2427801
> 0.3785717  2.34508664  1.1043552 -0.1883830  0.6503095
> Sample2  -2.0446131  1.72783245 -0.40965941 -0.21713655 -0.2386781
> -0.1944390 -0.25181541  0.8882743  0.8404783  0.2531209
> Sample3   1.7575174  0.03851425 -0.87537424  1.82811160  0.8342636
> -0.8155942 -0.90893068 -0.5529098 -1.6586626  0.1761717
> Sample4   0.2731749  0.24045167 -0.39913821 -0.48525909  0.1448994
> -2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545
> Sample5  -0.2014241 -1.04646151  0.28101160  0.74348390  0.1738312
> -0.8431262 -0.08842512  1.2909658  1.2013136  0.6706926
> Sample6   0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946
> 1.4525891 -0.39068509 -0.3373913 -0.0533732  0.9658389
> Sample7  -0.8533191 -0.34438091 -1.23890437 -0.77360636  0.5926479
> 0.7742632 -1.12515017 -0.5720099  0.2243808  0.5420693
> Sample8   0.4176988 -0.35906123 -0.07190644  0.90045123 -1.0621902
> 0.2693762 -0.38033715  0.6267548  0.4767652  0.3012347
> Sample9   0.1088066 -0.32197951  0.46665158 -1.72560781  0.7375796
> 0.2794331  1.00171777 -0.1087306  1.2519195 -0.8848459
> Sample10 -0.3651829  1.00253167 -0.12004007  0.34972942 -2.1310389
> 0.7162622 -0.19072440 -2.0173607 -0.9407956 -2.4224372
>
> I first multiply the matrix by the transposed matrix:
>
> > xxt=adjusted_dummyset %*% t(adjusted_dummyset)
> > xxt
>           Sample1    Sample2    Sample3    Sample4    Sample5    Sample6
> Sample7    Sample8    Sample9   Sample10
> Sample1  16.045163 -1.1367278 -2.5390033 -1.4762231  1.0158736 -1.8408484
> -5.8176751 -1.5163240  3.5304834 -6.2647180
> Sample2  -1.136728  9.0985066 -5.2175085 -0.8332460  0.7980575 -3.3607995
> 1.6342076 -0.3098849 -0.3462389 -0.3263661
> Sample3  -2.539003 -5.2175085 12.4738749  3.7747189 -0.9562913 -1.3252885
> -0.9175020  0.5849437 -6.0796087  0.2016647
> Sample4  -1.476223 -0.8332460  3.7747189  6.1161097 -1.0232149 -3.0984138
> -1.1213965 -1.9014901 -1.0503243  0.6134802
> Sample5   1.015874  0.7980575 -0.9562913 -1.0232149  6.0758629  0.2183710
> -0.9466710  2.1466445 -0.2626433 -7.0659890
> Sample6  -1.840848 -3.3607995 -1.3252885 -3.0984138  0.2183710  6.6590617
> 3.0772441  1.0977948  0.3980588 -1.8251802
> Sample7  -5.817675  1.6342076 -0.9175020 -1.1213965 -0.9466710  3.0772441
> 5.8681617 -0.9215294  0.1646922 -1.0195316
> Sample8  -1.516324 -0.3098849  0.5849437 -1.9014901  2.1466445  1.0977948
> -0.9215294  3.1757171 -2.2533118 -0.1025599
> Sample9   3.530483 -0.3462389 -6.0796087 -1.0503243 -0.2626433  0.3980588
> 0.1646922 -2.2533118  7.2986178 -1.3997251
> Sample10 -6.264718 -0.3263661  0.2016647  0.6134802 -7.0659890 -1.8251802
> -1.0195316 -0.1025599 -1.3997251 17.1889249
>
> Then I apply the Eigen function and save the eigenvectors and eigenvalues
> eigen(xxt,TRUE)
> $values
>  [1]  2.693535e+01  1.898845e+01  1.668814e+01  1.184849e+01  8.136377e+00
> 4.893302e+00  2.007293e+00  4.994285e-01  3.173378e-03 -1.652399e-15
>
> $vectors
>             [,1]        [,2]        [,3]        [,4]        [,5]
> [,6]         [,7]        [,8]        [,9]      [,10]
>  [1,]  0.60510924  0.13494101 -0.54092789 -0.11000076  0.11721749
> 0.37004560 -0.106993925 -0.10503855  0.19436075 -0.3162278
>  [2,]  0.05581480 -0.39453663  0.09160465  0.67089969  0.08088382
> 0.32755098 -0.009124933  0.11721316 -0.39379401 -0.3162278
>  [3,] -0.28666669  0.71121151 -0.01000162  0.03014755 -0.06261445
> 0.22739798  0.422156670  0.02330019 -0.27677054 -0.3162278
>  [4,] -0.13469379  0.23709870 -0.15611654  0.27760975 -0.50201617
> -0.31821931 -0.584748137  0.12405010  0.11661762 -0.3162278
>  [5,]  0.26143053  0.12386029  0.29769330  0.19250271  0.27617639
> -0.49754797  0.084826842 -0.59908097  0.02670441 -0.3162278
>  [6,]  0.01918933 -0.04796651  0.35560411 -0.57607725  0.07402744
> 0.14338749 -0.455866694 -0.01271614 -0.45276432 -0.3162278
>  [7,] -0.11887886 -0.18220751  0.42748177 -0.07360955 -0.34327294
> 0.38589551  0.142875991 -0.16600492  0.59142740 -0.3162278
>  [8,] -0.02725610  0.06425762  0.15301980 -0.01332073  0.53079711
> -0.18890512  0.012099700  0.65880817  0.34630946 -0.3162278
>  [9,]  0.24706215 -0.31886828 -0.13294196 -0.25401667 -0.42308063
> -0.38135353  0.482282448  0.24225089 -0.19843310 -0.3162278
> [10,] -0.62111060 -0.32779019 -0.48541562 -0.14413474  0.25188193
> -0.06825162  0.012492037 -0.28278192  0.04634233 -0.3162278
>
> I also calculate:
>
> xtx=t(adjusted_dummyset) %*% adjusted_dummyset
> > xtx
>            [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
> [,7]       [,8]       [,9]       [,10]
>  [1,]  9.0000000 -3.1796310  2.0417037  3.9514142  2.7275520 -1.66571377
> 1.5629736 -0.9104748 -4.8506341  0.68480373
>  [2,] -3.1796310  9.0000000  1.0981249  0.5494749 -1.2662064 -1.89318349
> 2.1005566 -0.5052055 -1.7947356 -3.90496189
>  [3,]  2.0417037  1.0981249  9.0000000 -1.1689542  2.3836861  1.22540724
> 8.5924385  4.2130304  1.8322104  0.72643120
>  [4,]  3.9514142  0.5494749 -1.1689542  9.0000000 -1.7132579 -2.51679155
> -2.8679239  0.5181511 -3.9536139  0.84095707
>  [5,]  2.7275520 -1.2662064  2.3836861 -1.7132579  9.0000000 -2.17714376
> 3.1966738  4.1903147  0.7937017  5.20170437
>  [6,] -1.6657138 -1.8931835  1.2254072 -2.5167916 -2.1771438  9.00000000
> 0.3764605 -1.9821441  2.3330854 -0.08204933
>  [7,]  1.5629736  2.1005566  8.5924385 -2.8679239  3.1966738  0.37646053
> 9.0000000  3.5708620  1.7809104 -0.28160124
>  [8,] -0.9104748 -0.5052055  4.2130304  0.5181511  4.1903147 -1.98214408
> 3.5708620  9.0000000  5.3281685  6.32863320
>  [9,] -4.8506341 -1.7947356  1.8322104 -3.9536139  0.7937017  2.33308543
> 1.7809104  5.3281685  9.0000000  2.27959505
> [10,]  0.6848037 -3.9049619  0.7264312  0.8409571  5.2017044 -0.08204933
> -0.2816012  6.3286332  2.2795951  9.00000000
> >
> and apply the eigen function to this:
>
> > eigen(xtx,TRUE)
> $values
>  [1] 2.693535e+01 1.898845e+01 1.668814e+01 1.184849e+01 8.136377e+00
> 4.893302e+00 2.007293e+00 4.994285e-01 3.173378e-03 6.009067e-16
>
> $vectors
>             [,1]        [,2]         [,3]        [,4]         [,5]
> [,6]        [,7]        [,8]        [,9]       [,10]
>  [1,] -0.01604003  0.56323526 -0.190946669 -0.40105857 -0.002543387
> 0.17417910  0.25192284 -0.59870554  0.18090811 -0.01463338
>  [2,]  0.08466679 -0.16695878 -0.455330886  0.54953906 -0.002740213
> -0.47379253  0.19091718 -0.36818489  0.23997046  0.03943000
>  [3,] -0.42016626 -0.01672674 -0.438168969 -0.16986221 -0.259624942
> 0.03774746 -0.20059192  0.33694834  0.50122476 -0.35847979
>  [4,]  0.17380688  0.46248616 -0.009158497  0.19699882 -0.641525880
> -0.08643216  0.32002024  0.38196270 -0.08822512  0.20466364
>  [5,] -0.38230998  0.27820625  0.061021059  0.01793892  0.556873562
> -0.31199982  0.48447946  0.33729239 -0.05542533 -0.11569124
>  [6,] -0.01078626 -0.35623684  0.086516360 -0.57830505 -0.317806785
> -0.56648878  0.22620425 -0.09749859 -0.18851088 -0.11373558
>  [7,] -0.41357826 -0.06924232 -0.495909637 -0.11281244  0.008420391
> 0.07292844 -0.09886884  0.01075021 -0.44335225  0.59469674
>  [8,] -0.48705511  0.07929178  0.158922061  0.33799785 -0.279059362
> 0.07881225 -0.04227138 -0.30875079 -0.45445853 -0.47881039
>  [9,] -0.33349167 -0.40387477  0.287225004  0.07874711 -0.163416914
> 0.37364320  0.52438989 -0.09014761  0.33696991  0.27201980
> [10,] -0.34649980  0.24943468  0.446371805  0.03960183 -0.072567359
> -0.40852699 -0.43011632 -0.14042978  0.30891048  0.38025985
>
>
> I create a diagonal matrix containing the square  roots of the eigenvalues
> along the diagonal:
>
> > diagonalmatrix_eigenvaluesqrts
>     V1   V2   V3    V4    V5    V6    V7    V8    V9    V10
> 1  5.61 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
> 2  0.00 1.41 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
> 3  0.00 0.00 1.12 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
> 4  0.00 0.00 0.00 0.941 0.000 0.000 0.000 0.000 0.000 0.0000
> 5  0.00 0.00 0.00 0.000 0.727 0.000 0.000 0.000 0.000 0.0000
> 6  0.00 0.00 0.00 0.000 0.000 0.471 0.000 0.000 0.000 0.0000
> 7  0.00 0.00 0.00 0.000 0.000 0.000 0.296 0.000 0.000 0.0000
> 8  0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.246 0.000 0.0000
> 9  0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.159 0.0000
> 10 0.00 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.0127
>
> I have been operating under the assumption that the original data matrix
> could be reconstructed by multiplying the eigenvectors from xxt X a
> diagonal
> matrix with the square roots of the eigenvalues along the diagonal X the
> eigenvectors from xtx as follows:
>
>  Reconstructed_dummyset=eigenvectors_xxt %*%
> as.matrix(diagonalmatrix_eigenvaluesqrts) %*% t(eigenvectors_xtx)
> > Reconstructed_dummyset
>              [,1]         [,2]        [,3]       [,4]        [,5]
> [,6]        [,7]        [,8]         [,9]      [,10]
>  [1,]  0.253193770  0.402535986 -1.14743281  0.5698820 -1.31593686
> -0.23278020 -1.10482209 -1.78241789 -1.344859283 -1.4554367
>  [2,] -0.593623956  0.320035080 -0.30357878 -0.1156343 -0.25992539
> -0.25866729 -0.17538542  0.05041073  0.221261845 -0.2675080
>  [3,]  0.620290857 -0.322491313  0.63113668  0.2551444  0.90025477
> -0.36701269  0.61029139  0.90661927  0.226875492  0.6933749
>  [4,]  0.045394154  0.135401863  0.48897212  0.2772398  0.14530592
> -0.12141324  0.33943264  0.53647586  0.002607645  0.4395889
>  [5,] -0.007416873  0.213143606 -0.90810973  0.2103079 -0.34091303
> -0.06529543 -0.82817426 -0.61633082 -0.543192842 -0.2177753
>  [6,]  0.068080740 -0.533029067 -0.14658587 -0.2022178 -0.09846926
> 0.29932974 -0.12798021 -0.14611192 -0.016267532  0.1053571
>  [7,] -0.113178997 -0.309552826  0.18447270 -0.1195559  0.01775153
> 0.15282100  0.02644529  0.41110216  0.623380590  0.3368958
>  [8,] -0.076800910 -0.117830736 -0.03085105 -0.1712106  0.39036065
> -0.10570034 -0.05526268 -0.08333122 -0.028936234  0.1511139
>  [9,] -0.187036903  0.212383109 -0.41884937  0.2704928 -0.69211148
> 0.50295414 -0.45885438 -0.75166730 -0.301317053 -0.6536756
> [10,] -0.008314191 -0.002179248  1.66522298 -0.9826679  1.25832941
> 0.20033204  1.75042615  1.49448058  1.149522805  0.8527934
>
> However as you can see I've clearly got my wires crossed somewhere!
>
> I'd really appreciate any advice you could offer... Thanks!
>
> Julia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Data-reconstruction-following-PCA-using-Eigen-function-tp2226535p2226535.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data reconstruction following PCA using Eigen function

Reply via email to