Re: [R] Data reconstruction following PCA using Eigen function

2010-05-24 Thread Julia El-Sayed Moustafa

Hi Thomas,

Thanks very much for your reply. I used svd and it worked perfectly for my
purposes!

Thanks again,
Julia
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Data-reconstruction-following-PCA-using-Eigen-function-tp2226535p2229191.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data reconstruction following PCA using Eigen function

2010-05-22 Thread Thomas Stewart
Looks like you have some numerical precision issues.  Why not use the svd
function directly?  (See below.)

-tgs

x - read.table(
textConnection(
Sample1   0.7329881  0.76912670  2.45906143 -0.06411602  1.2427801
0.3785717  2.34508664  1.1043552 -0.1883830  0.6503095
Sample2  -2.0446131  1.72783245 -0.40965941 -0.21713655 -0.2386781
-0.1944390 -0.25181541  0.8882743  0.8404783  0.2531209
Sample3   1.7575174  0.03851425 -0.87537424  1.82811160  0.8342636
-0.8155942 -0.90893068 -0.5529098 -1.6586626  0.1761717
Sample4   0.2731749  0.24045167 -0.39913821 -0.48525909  0.1448994
-2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545
Sample5  -0.2014241 -1.04646151  0.28101160  0.74348390  0.1738312
-0.8431262 -0.08842512  1.2909658  1.2013136  0.6706926
Sample6   0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946 1.4525891
-0.39068509 -0.3373913 -0.0533732  0.9658389
Sample7  -0.8533191 -0.34438091 -1.23890437 -0.77360636  0.5926479 0.7742632
-1.12515017 -0.5720099  0.2243808  0.5420693
Sample8   0.4176988 -0.35906123 -0.07190644  0.90045123 -1.0621902 0.2693762
-0.38033715  0.6267548  0.4767652  0.3012347
Sample9   0.1088066 -0.32197951  0.46665158 -1.72560781  0.7375796 0.2794331
 1.00171777 -0.1087306  1.2519195 -0.8848459
Sample10 -0.3651829  1.00253167 -0.12004007  0.34972942 -2.1310389 0.7162622
-0.19072440 -2.0173607 -0.9407956 -2.4224372),
 header=F, as.is=TRUE)

X-as.matrix(x[,-1])

V - t( eigen( t(X) %*% X , T )$vectors )
U - eigen( X %*% t( X ) , T )$vectors
D - diag( sqrt( eigen( X %*% t( X ) , T )$values ) )

X - U %*% D %*% t(V)

svd(X)$v - V
svd(X)$u - U
svd(X)$d - sqrt( eigen( X %*% t( X ) , T )$values )

X-svd(X)$u %*% diag( svd(X)$d ) %*% t( svd(X)$v )






On Fri, May 21, 2010 at 2:25 PM, Julia El-Sayed Moustafa 
jelsa...@imperial.ac.uk wrote:


 Hi all,

 As a molecular biologist by training, I'm fairly new to R (and
 statistics!),
 and was hoping for some advice. First of all, I'd like to apologise if my
 question is more methodological rather than relating to a specific R
 function. I've done my best to search both in the forum and elsewhere but
 can't seem to find an answer which works in practice.

 I am carrying out principal component analysis in R using the Eigen
 function, and have applied this to my dataset and can identify patterns in
 my data which result from batch effects. What I'd ultimately like to do is
 use PCA as a noise filter to remove the batch effects from my data, and the
 way I would like to try and do this is by reconstructing the data, minus
 the
 components (or that part of them) which correlate with batch.

 As I have an extremely large dataset, as a first step I am attempting
 simply
 to apply the Eigen function to a dummy dataset consisting of only 10
 individuals and ten variables derived from my originial data, then
 reconstruct the original dummy data matrix, just to see if I can get the
 basics of the method to work first. Here is the dummy matrix (adjusted by
 variable mean-subtraction and division by the variable standard deviation
 applied to each data point) I am using:

 adjusted_dummyset
   [,1][,2][,3][,4]   [,5]
 [,6][,7]   [,8]   [,9]  [,10]
 Sample1   0.7329881  0.76912670  2.45906143 -0.06411602  1.2427801
 0.3785717  2.34508664  1.1043552 -0.1883830  0.6503095
 Sample2  -2.0446131  1.72783245 -0.40965941 -0.21713655 -0.2386781
 -0.1944390 -0.25181541  0.8882743  0.8404783  0.2531209
 Sample3   1.7575174  0.03851425 -0.87537424  1.82811160  0.8342636
 -0.8155942 -0.90893068 -0.5529098 -1.6586626  0.1761717
 Sample4   0.2731749  0.24045167 -0.39913821 -0.48525909  0.1448994
 -2.0173360 -0.01073639 -0.3219478 -1.1536431 -0.2521545
 Sample5  -0.2014241 -1.04646151  0.28101160  0.74348390  0.1738312
 -0.8431262 -0.08842512  1.2909658  1.2013136  0.6706926
 Sample6   0.1743534 -1.70657357 -0.09170187 -0.55605031 -0.2940946
 1.4525891 -0.39068509 -0.3373913 -0.0533732  0.9658389
 Sample7  -0.8533191 -0.34438091 -1.23890437 -0.77360636  0.5926479
 0.7742632 -1.12515017 -0.5720099  0.2243808  0.5420693
 Sample8   0.4176988 -0.35906123 -0.07190644  0.90045123 -1.0621902
 0.2693762 -0.38033715  0.6267548  0.4767652  0.3012347
 Sample9   0.1088066 -0.32197951  0.46665158 -1.72560781  0.7375796
 0.2794331  1.00171777 -0.1087306  1.2519195 -0.8848459
 Sample10 -0.3651829  1.00253167 -0.12004007  0.34972942 -2.1310389
 0.7162622 -0.19072440 -2.0173607 -0.9407956 -2.4224372

 I first multiply the matrix by the transposed matrix:

  xxt=adjusted_dummyset %*% t(adjusted_dummyset)
  xxt
   Sample1Sample2Sample3Sample4Sample5Sample6
 Sample7Sample8Sample9   Sample10
 Sample1  16.045163 -1.1367278 -2.5390033 -1.4762231  1.0158736 -1.8408484
 -5.8176751 -1.5163240  3.5304834 -6.2647180
 Sample2  -1.136728  9.0985066 -5.2175085 -0.8332460  0.7980575 -3.3607995
 1.6342076 -0.3098849 -0.3462389 -0.3263661
 Sample3  -2.539003 -5.2175085 12.4738749  3.7747189 -0.9562913