Re: [R] after PCA, the pc values are so large, wrong?

2009-11-08 Thread bbslover

ok,I understand your means, maybe PLS is better for my aim. but I have done
that, also bad. the most questions for me is how to select less variables
from the independent to fit dependent. GA maybe is good way, but I do not
learn it well.

Ben Bolker wrote:
 
 bbslover dluthm at yeah.net writes:
 
 
 [snip]
 
 the fit result below:
 Call:
 lm(formula = y ~ x1 + x2 + x3, data = pc)
 
 Residuals:
  Min   1Q   Median   3Q  Max 
 -1.29638 -0.47622  0.01059  0.49268  1.69335 
 
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  5.613e+00  8.143e-02  68.932   2e-16 ***
 x1  -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
 x2  -4.095e-05  3.448e-05  -1.1880.239
 x3  -8.106e-05  6.412e-05  -1.2640.210
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 Residual standard error: 0.691 on 68 degrees of freedom
 Multiple R-squared: 0.3644, Adjusted R-squared: 0.3364 
 F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07 
 
 x2,x3 is not significance. by pricipal, after PCA, the pcs should
 significance, but my data is not, why? 
 
   Why is it necessary that the first few principal components
 should have significant relationships with some other response
 values?  The strength, and weakness, of PCA is that it is
 calculated *without regard* to a response variable, so it
 does not constitute data snooping ... 
   I may of course have misinterpreted your question, but at
 a quick look, I don't see anything obviously wrong here.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/after-PCA%2C-the-pc-values-are-so-large%2C-wrong--tp26240926p26251658.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] after PCA, the pc values are so large, wrong?

2009-11-07 Thread bbslover

rm(list=ls())
yx.df-read.csv(c:/MK-2-72.csv,sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y-yx.df[,1]
x-yx.df[,2:643]
#conver to matrix
mat-as.matrix(x)
#get row number
rownum-nrow(mat)
#remove the constant parameters
mat1-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))]
dim(yx.df)
dim(mat1)
#remove columns with numbers of zero 0.95 
mat2-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum0.95))] 
dim(yx.df)
dim(mat2)
#remove colunms that sd0.5
mat3-mat2[,apply(mat2,2,function(.col)!all(sd(.col)0.5))]
dim(yx.df)
dim(mat3)
#PCA analysis
mat3.pr-prcomp(mat3,cor=T)
summary(mat3.pr,loading=T)
pre.cmp-predict(mat3.pr)
cmp-pre.cmp[,1:3]
cmp
DF-cbind(Y,cmp) 
DF-as.data.frame(DF)
names(DF)-c('y','p1','p2','p3')
DF
summary(lm(y~p1+p2+p3,data=DF))
mat3.pr-prcomp(DF,cor=T)
summary(mat3.pr)
pre-predict(mat3.pr)
pre1-pre[,1:3]
pre1
colnames(pre1)-c(x1,x2,x3)
pre1
pc-cbind(y,pre1)
pc-as.data.frame(pc)
lm.pc-lm(y~x1+x2+x3,data=pc)
summary(lm.pc)

above, my code about pca, but after finishing it, the first three pcs are
some large, why? and the fit value

r2 are bad.   belowe is my value on the firest 3 pcs.
 pre1
  PC1  PC2  PC3
 [1,] -15181.5190  1944.392700 -1074.326182
 [2,] -32152.4533  1007.113729  3201.361408
 [3,] -15836.5362  2117.988273  -555.799383
 [4,]  -1618.5561  1481.020337   255.530132
 [5,]  -5407.5030  1975.779398   -84.646283
 [6,]  -9662.1949  2611.220928  -417.435782
 [7,] -30488.2102   577.385588  1853.420297
 [8,]  -2135.2563 -4506.112873  1382.413284
 [9,]  -1584.2796 -4645.142062   929.146895
[10,]   -668.7664 -4876.250486   177.691446
[11,]  -2188.5914 -4495.203080  1432.428127
[12,] -19633.9581  2159.000138 -1598.710872
[13,] -26849.1088  -515.574085 -2683.552623
[14,]  -9492.9503 -4868.648205  1236.986097
[15,] -13857.6517 -4810.228193  1296.342199
[16,] -11596.5097 -8181.631403   462.913210
[17,] -25948.6564  -746.442386 -3415.426682
[18,]  15386.4477   709.974524   555.160973
[19,]  21642.7516  1163.456075  -609.437740
[20,]  22236.7094   675.562564  -136.992578
[21,]  14354.9927   611.996274-4.867054
[22,]  12569.9493  .842240   585.540985
[23,]  20739.0219  3078.679745  1662.902248
[24,]   9472.0249   648.769910   381.487034
[25,]  17299.5307  1424.712428  1522.311676
[26,]  13231.2735   587.761915   170.448061
[27,]  10843.5590   705.485396   -79.931518
[28,]   9402.8803 -1978.216853 -1534.244078
[29,]  13094.9525   212.042937  -363.941664
[30,]   9337.3522   537.885230   189.558999
[31,]   7747.1347  -141.004825 -1664.082447
[32,]   4640.1161 -1489.652284 -3584.574135
[33,]  13241.5054   175.630689  -486.250927
[34,]   3867.2204   814.830143  1584.358007
[35,]   8614.5030   708.274447   814.295587
[36,] -18815.6774  -480.311541  1248.369916
[37,]  -1860.0810  1195.557861   269.322703
[38,]   7172.0057 4.216905 -1191.448702
[39,]  -7233.2271 -2361.951658  -235.293358
[40,]   1841.3548  1187.225488   632.116420
[41,]  12465.2336   367.822405   160.751014
[42,] -39021.7259  1972.333778  3167.504098
[43,]  13098.7736  -424.152058  -567.846037
[44,]   9793.7729  -559.084900  -210.696126
[45,]  13111.186122.772626  -318.242722
[46,]  13169.0604 7.808885  -363.995563
[47,]   3306.6293  -694.908211  -642.996604
[48,]  10779.8582  -989.175596 -1619.861931
[49,]  10872.6913  -747.979343 -1375.317959
[50,]  -3057.5633  1838.449143  1454.886518
[51,]  -6854.9316  2338.753165  1113.510561
[52,] -15077.1823  1917.776905 -1158.158633
[53,] -45862.8305  1173.157521 -1707.293955
[54,] -14294.1553  1716.708462 -1794.064434
[55,]  24645.0508  2519.904889  1424.233563
[56,]  23303.5998  2250.088386   839.587354
[57,]  18865.5231   897.56644636.240598
[58,]227.2659 -6582.661199  -712.892569
[59,]  15336.8371   722.953549   593.903314
[60,]  13030.8715   228.509670  -312.933654
[61,]   5826.0388   331.077814   -53.417878
[62,]  13150.4446  -437.612023  -608.342969
[63,]  11728.3897   -83.151510   569.007995
[64,]  11021.5720  -869.425283 -1216.724017
[65,]   9625.3142   137.388994   138.735249
[66,] -15905.2704  3735.547166   421.846379
[67,] -15539.7628  3331.399648   104.886572
[68,]  -2294.9924  1648.164750   822.075221
[69,] -10120.0153  1558.766306  -333.378256
[70,] -24241.4554  -533.700229  1516.603088
[71,]  -1036.6022 -4782.136067   475.195011
[72,] -24575.2244  2655.599986 -1965.946921

the fit result below:
Call:
lm(formula = y ~ x1 + x2 + x3, data = pc)

Residuals:
 Min   1Q   Median   3Q  Max 
-1.29638 -0.47622  0.01059  0.49268  1.69335 

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)  5.613e+00  8.143e-02  68.932   2e-16 ***
x1  -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
x2  -4.095e-05  3.448e-05  -1.1880.239
x3  -8.106e-05  6.412e-05  -1.2640.210
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.691 on 68 degrees of freedom
Multiple R-squared: 0.3644, Adjusted R-squared: