Re: [R] how to tell if its better to standardize your data matrix first when you do principal

2009-11-23 Thread masterinex



Actually Its for an assignment Michael 
, all Im looking  is some help  and suggestions , please dont get it wrong ,
and I do believe that 
this is a helpful community .


> 
> This sounds a bit like homework. If that is the case, please ask your
> teacher rather than this list.
> Anyway, it does not make sense to predict weight using a linear
> combination (principle component) that contains weight, does it?
> 
> Uwe Ligges

It's likely to have been homework: A quick search on "masterinex"
"xevilgang79" reveal which university this undergraduate student is at. It
also produces a phone number, which can be used to lookup an address, and a
cell phone number.

MK
__
R-help@r-project.org mailing list

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-- 
View this message in context: 
http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26490273.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to tell if its better to standardize your data matrix first when you do principal

2009-11-22 Thread masterinex

this is how my data matrix looks like . This is just for the first 10
observations , but the pattern is similar for the other observations.  


112.3 154.25  67.75 36.2  93.1  85.2  94.5  59.0 37.3  21.9   32.0   
27.4  17.1
2 6.1 173.25  72.25 38.5  93.6  83.0  98.7  58.7 37.3  23.4   30.5   
28.9  18.2
325.3 154.00  66.25 34.0  95.8  87.9  99.2  59.6 38.9  24.0   28.8   
25.2  16.6
410.4 184.75  72.25 37.4 101.8  86.4 101.2  60.1 37.3  22.8   32.4   
29.4  18.2
528.7 184.25  71.25 34.4  97.3 100.0 101.9  63.2 42.2  24.0   32.2   
27.7  17.7
620.9 210.25  74.75 39.0 104.5  94.4 107.8  66.0 42.0  25.6   35.7   
30.6  18.8
719.2 181.00  69.75 36.4 105.1  90.7 100.3  58.4 38.3  22.9   31.9   
27.8  17.7
812.4 176.00  72.50 37.8  99.6  88.5  97.1  60.0 39.4  23.2   30.5   
29.0  18.8
9 4.1 191.00  74.00 38.1 100.9  82.5  99.9  62.9 38.3  23.8   35.9   
31.1  18.2
10   11.7 198.25  73.50 42.1  99.6  88.6 104.1  63.1 41.7  25.0   35.6   
30.0  19.2


and after standardizing it  . 

1   -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055
-0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945
2   -1.588060506 -0.185928394  0.75868364  0.23560461 -0.889886435
-0.931523054 -0.155497233 -0.1252522485 -0.53901713  0.295114747 -0.59529632
30.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906
-0.444727135 -0.077038289  0.0628841989  0.15515266  0.743320270 -1.17652277
4   -1.063161122  0.245595958  0.75868364 -0.24734870  0.133598535
-0.593746294  0.236797489  0.1674044475 -0.53901713 -0.153090775  0.05430971
51.170713001  0.226834030  0.37157577 -1.56449410 -0.428070046 
0.757360745  0.346640011  0.8154299886  1.58687786  0.743320270 -0.01406987
60.218569932  1.202454304  1.72645331  0.45512884  0.470599683 
0.201022552  1.27244  1.4007433805  1.50010664  1.938534997  1.18257281
70.011051571  0.104881496 -0.20908604 -0.68639717  0.545488828
-0.166558039  0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925
8   -0.819021874 -0.082737788  0.85546060 -0.07172932 -0.140994994
-0.385119472 -0.406565855  0.1465003978  0.37208072  0.145712907 -0.59529632
9   -1.832199755  0.480120063  1.43612241  0.05998522  0.021264819
-0.981196107  0.032804234  0.7527178395 -0.10516101  0.593918429  1.25095239
10  -0.904470611  0.752168024  1.24256848  1.81617909 -0.140994994
-0.375184861  0.691859366  0.7945259389  1.36994980  1.490329474  1.14838302



this is the result of applying PCA to the data matrix

Standard deviations:
 [1] 30.6645414  7.5513852  3.6927427  2.8703435  2.5363007  1.9136933 
1.5624131  1.3689630  1.2976189
[10]  1.1633458  1.1118231  0.7847148  0.4802303

Rotation:
PC1 PC2 PC3  PC4  PC5 
PC6  PC7 PC8
var1  0.18110712 -0.74864138 -0.46070566 -0.365658769  0.192810075
-0.132529979  0.023764851  0.03674873
var2  0.86458284  0.34243386 -0.05766909 -0.235504989 -0.046075934 
0.001493006 -0.024535011  0.13439659
var3  0.03765598  0.20097537 -0.15709612 -0.343218776 -0.295201121
-0.073295697 -0.086930370 -0.54389141
var40.05965733  0.01737951  0.09854179 -0.030801791  0.125735684 
0.341795876 -0.001735808  0.37152696
var5   0.23845698 -0.20616399  0.68948870  0.025904812  0.391188182
-0.428933369 -0.101780281 -0.16965893
var6   0.29928369 -0.47394636  0.24791449  0.341235161 -0.511378719 
0.447071255 -0.077534385 -0.13198544
var7 0.19503685  0.01385823 -0.24126047  0.531403827 -0.127426510
-0.410568454  0.608163973 -0.01265457
var8   0.13261863  0.06839078 -0.37740589  0.535332339  0.366103479 
0.032376851 -0.574484605 -0.05645694
var90.06246705  0.04407384 -0.09545362  0.037993146 -0.036651080 
0.012347288 -0.192976142 -0.13027876
var10   0.03027791  0.05533988 -0.03749859 -0.009257423  0.011026593
-0.010770032 -0.104041067  0.12125263
var11  0.07435322  0.04334969 -0.02666944  0.032036374  0.464035624 
0.454970952  0.347507539 -0.60527541
var12 0.04328710  0.04731771  0.00360668 -0.054200633  0.275901346 
0.297800123  0.324323749  0.30487145
var13   0.02095652  0.02146485  0.03598618 -0.022510780  0.005192075 
0.103988977  0.031541374  0.07877455

   PC9 PC10 PC11PC12 PC13
var1   -0.005328345  0.030549780 -0.049283616 -0.02211988  0.015660892
var2   0.170766596 -0.144031738  0.028862963  0.06984674  0.006293703
var3  -0.282549313  0.548650592  0.131284937 -0.14740722 -0.002384605
var4 0.024070488  0.614154008 -0.551480394 -0.03446124 -0.178123011
var5   -0.157551008  0.147685248  0.008044148 -0.04068258  0.007778992
var6   -0.058675551  0.006344813  0.130814072 -0.04088919 -0.028655330
var7 -0.099243751  0.171852216 -0.149231752 -0.06690208 -0.014693444
var80.006629025  0.199158097  0.187226774 -0.02511968  0.070896819
var9-0.658214712 -0.320120384 -0.53990  0.37630539 -0.023642902
var10   -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215
var11   0.157450716

Re: [R] how to tell if its better to standardize your data matrix first when you do principal

2009-11-22 Thread masterinex

Hi Hadley , 

I really apreciate the suggestions you gave, It was helpful , but I still
didnt quite get it all.   and I really want to do a good job , so any
comments would sure come helpful, please understand me . 





hadley wrote:
> 
> You've asked the same question on stackoverflow.com and received the
> same answer.  This is rude because it duplicates effort.  If you
> urgently need a response to a question, perhaps you should consider
> paying for it.
> 
> Hadley
> 
> On Sun, Nov 22, 2009 at 12:04 PM, masterinex 
> wrote:
>>
>> so under which cases is it better to  standardize  the data matrix first
>> ?
>> also  is  PCA generally used to predict the response variable , should I
>> keep that variable in my data matrix ?
>>
>>
>> Uwe Ligges-3 wrote:
>>>
>>> masterinex wrote:
>>>>
>>>>
>>>> Hi guys ,
>>>>
>>>> Im trying to do principal component analysis in R . There is 2 ways of
>>>> doing
>>>> it , I believe.
>>>> One is doing  principal component analysis right away the other way is
>>>> standardizing the matrix first  using s = scale(m)and then apply
>>>> principal
>>>> component analysis.
>>>> How  do I tell what result is better ? What values in particular should
>>>> i
>>>> look at . I already managed to find the eigenvalues and eigenvectors ,
>>>> the
>>>> proportion of  variance for each eigenvector using both methods.
>>>>
>>>
>>> Generally, it is better to standardize. But in some cases, e.g. for the
>>> same units in your variables indicating also the importance, it might
>>> make sense not to do so.
>>> You should think about the analysis, you cannot know which result is
>>> `better' unless you know an interpretation.
>>>
>>>
>>>
>>>> I noticed that the proportion of the variance for the first  pca
>>>> without
>>>> standardizing had a larger  value . Is there a meaning to it ? Isnt
>>>> this
>>>> always the case?
>>>>  At last , if I am  supposed to predict a variable ie weight should I
>>>> drop
>>>> the variable ie weight from my data matrix when I do principal
>>>> component
>>>> analysis ?
>>>
>>>
>>> This sounds a bit like homework. If that is the case, please ask your
>>> teacher rather than this list.
>>> Anyway, it does not make sense to predict weight using a linear
>>> combination (principle component) that contains weight, does it?
>>>
>>> Uwe Ligges
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> http://had.co.nz/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26471673.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to tell if its better to standardize your data matrix first when you do principal

2009-11-22 Thread masterinex

so under which cases is it better to  standardize  the data matrix first ?
also  is  PCA generally used to predict the response variable , should I
keep that variable in my data matrix ?


Uwe Ligges-3 wrote:
> 
> masterinex wrote:
>> 
>> 
>> Hi guys , 
>> 
>> Im trying to do principal component analysis in R . There is 2 ways of
>> doing
>> it , I believe. 
>> One is doing  principal component analysis right away the other way is 
>> standardizing the matrix first  using s = scale(m)and then apply
>> principal
>> component analysis.   
>> How  do I tell what result is better ? What values in particular should i
>> look at . I already managed to find the eigenvalues and eigenvectors ,
>> the
>> proportion of  variance for each eigenvector using both methods.
>> 
> 
> Generally, it is better to standardize. But in some cases, e.g. for the 
> same units in your variables indicating also the importance, it might 
> make sense not to do so.
> You should think about the analysis, you cannot know which result is 
> `better' unless you know an interpretation.
> 
> 
> 
>> I noticed that the proportion of the variance for the first  pca without
>> standardizing had a larger  value . Is there a meaning to it ? Isnt this
>> always the case?
>>  At last , if I am  supposed to predict a variable ie weight should I
>> drop
>> the variable ie weight from my data matrix when I do principal component
>> analysis ?
> 
> 
> This sounds a bit like homework. If that is the case, please ask your 
> teacher rather than this list.
> Anyway, it does not make sense to predict weight using a linear 
> combination (principle component) that contains weight, does it?
> 
> Uwe Ligges
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to tell if its better to standardize your data matrix first when you do principal

2009-11-21 Thread masterinex



Hi guys , 

Im trying to do principal component analysis in R . There is 2 ways of doing
it , I believe. 
One is doing  principal component analysis right away the other way is 
standardizing the matrix first  using s = scale(m)and then apply principal
component analysis.   
How  do I tell what result is better ? What values in particular should i
look at . I already managed to find the eigenvalues and eigenvectors , the
proportion of  variance for each eigenvector using both methods.



I noticed that the proportion of the variance for the first  pca without
standardizing had a larger  value . Is there a meaning to it ? Isnt this
always the case?
 At last , if I am  supposed to predict a variable ie weight should I drop
the variable ie weight from my data matrix when I do principal component
analysis ?
-- 
View this message in context: 
http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26462070.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.