Re: [R] how to tell if its better to standardize your data matrix first when you do principal
Actually Its for an assignment Michael , all Im looking is some help and suggestions , please dont get it wrong , and I do believe that this is a helpful community . > > This sounds a bit like homework. If that is the case, please ask your > teacher rather than this list. > Anyway, it does not make sense to predict weight using a linear > combination (principle component) that contains weight, does it? > > Uwe Ligges It's likely to have been homework: A quick search on "masterinex" "xevilgang79" reveal which university this undergraduate student is at. It also produces a phone number, which can be used to lookup an address, and a cell phone number. MK __ R-help@r-project.org mailing list PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26490273.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
this is how my data matrix looks like . This is just for the first 10 observations , but the pattern is similar for the other observations. 112.3 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 2 6.1 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 325.3 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 410.4 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 528.7 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 620.9 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 719.2 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7 812.4 176.00 72.50 37.8 99.6 88.5 97.1 60.0 39.4 23.2 30.5 29.0 18.8 9 4.1 191.00 74.00 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2 10 11.7 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6 30.0 19.2 and after standardizing it . 1 -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055 -0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945 2 -1.588060506 -0.185928394 0.75868364 0.23560461 -0.889886435 -0.931523054 -0.155497233 -0.1252522485 -0.53901713 0.295114747 -0.59529632 30.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906 -0.444727135 -0.077038289 0.0628841989 0.15515266 0.743320270 -1.17652277 4 -1.063161122 0.245595958 0.75868364 -0.24734870 0.133598535 -0.593746294 0.236797489 0.1674044475 -0.53901713 -0.153090775 0.05430971 51.170713001 0.226834030 0.37157577 -1.56449410 -0.428070046 0.757360745 0.346640011 0.8154299886 1.58687786 0.743320270 -0.01406987 60.218569932 1.202454304 1.72645331 0.45512884 0.470599683 0.201022552 1.27244 1.4007433805 1.50010664 1.938534997 1.18257281 70.011051571 0.104881496 -0.20908604 -0.68639717 0.545488828 -0.166558039 0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925 8 -0.819021874 -0.082737788 0.85546060 -0.07172932 -0.140994994 -0.385119472 -0.406565855 0.1465003978 0.37208072 0.145712907 -0.59529632 9 -1.832199755 0.480120063 1.43612241 0.05998522 0.021264819 -0.981196107 0.032804234 0.7527178395 -0.10516101 0.593918429 1.25095239 10 -0.904470611 0.752168024 1.24256848 1.81617909 -0.140994994 -0.375184861 0.691859366 0.7945259389 1.36994980 1.490329474 1.14838302 this is the result of applying PCA to the data matrix Standard deviations: [1] 30.6645414 7.5513852 3.6927427 2.8703435 2.5363007 1.9136933 1.5624131 1.3689630 1.2976189 [10] 1.1633458 1.1118231 0.7847148 0.4802303 Rotation: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 var1 0.18110712 -0.74864138 -0.46070566 -0.365658769 0.192810075 -0.132529979 0.023764851 0.03674873 var2 0.86458284 0.34243386 -0.05766909 -0.235504989 -0.046075934 0.001493006 -0.024535011 0.13439659 var3 0.03765598 0.20097537 -0.15709612 -0.343218776 -0.295201121 -0.073295697 -0.086930370 -0.54389141 var40.05965733 0.01737951 0.09854179 -0.030801791 0.125735684 0.341795876 -0.001735808 0.37152696 var5 0.23845698 -0.20616399 0.68948870 0.025904812 0.391188182 -0.428933369 -0.101780281 -0.16965893 var6 0.29928369 -0.47394636 0.24791449 0.341235161 -0.511378719 0.447071255 -0.077534385 -0.13198544 var7 0.19503685 0.01385823 -0.24126047 0.531403827 -0.127426510 -0.410568454 0.608163973 -0.01265457 var8 0.13261863 0.06839078 -0.37740589 0.535332339 0.366103479 0.032376851 -0.574484605 -0.05645694 var90.06246705 0.04407384 -0.09545362 0.037993146 -0.036651080 0.012347288 -0.192976142 -0.13027876 var10 0.03027791 0.05533988 -0.03749859 -0.009257423 0.011026593 -0.010770032 -0.104041067 0.12125263 var11 0.07435322 0.04334969 -0.02666944 0.032036374 0.464035624 0.454970952 0.347507539 -0.60527541 var12 0.04328710 0.04731771 0.00360668 -0.054200633 0.275901346 0.297800123 0.324323749 0.30487145 var13 0.02095652 0.02146485 0.03598618 -0.022510780 0.005192075 0.103988977 0.031541374 0.07877455 PC9 PC10 PC11PC12 PC13 var1 -0.005328345 0.030549780 -0.049283616 -0.02211988 0.015660892 var2 0.170766596 -0.144031738 0.028862963 0.06984674 0.006293703 var3 -0.282549313 0.548650592 0.131284937 -0.14740722 -0.002384605 var4 0.024070488 0.614154008 -0.551480394 -0.03446124 -0.178123011 var5 -0.157551008 0.147685248 0.008044148 -0.04068258 0.007778992 var6 -0.058675551 0.006344813 0.130814072 -0.04088919 -0.028655330 var7 -0.099243751 0.171852216 -0.149231752 -0.06690208 -0.014693444 var80.006629025 0.199158097 0.187226774 -0.02511968 0.070896819 var9-0.658214712 -0.320120384 -0.53990 0.37630539 -0.023642902 var10 -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215 var11 0.157450716
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
Hi Hadley , I really apreciate the suggestions you gave, It was helpful , but I still didnt quite get it all. and I really want to do a good job , so any comments would sure come helpful, please understand me . hadley wrote: > > You've asked the same question on stackoverflow.com and received the > same answer. This is rude because it duplicates effort. If you > urgently need a response to a question, perhaps you should consider > paying for it. > > Hadley > > On Sun, Nov 22, 2009 at 12:04 PM, masterinex > wrote: >> >> so under which cases is it better to standardize the data matrix first >> ? >> also is PCA generally used to predict the response variable , should I >> keep that variable in my data matrix ? >> >> >> Uwe Ligges-3 wrote: >>> >>> masterinex wrote: >>>> >>>> >>>> Hi guys , >>>> >>>> Im trying to do principal component analysis in R . There is 2 ways of >>>> doing >>>> it , I believe. >>>> One is doing principal component analysis right away the other way is >>>> standardizing the matrix first using s = scale(m)and then apply >>>> principal >>>> component analysis. >>>> How do I tell what result is better ? What values in particular should >>>> i >>>> look at . I already managed to find the eigenvalues and eigenvectors , >>>> the >>>> proportion of variance for each eigenvector using both methods. >>>> >>> >>> Generally, it is better to standardize. But in some cases, e.g. for the >>> same units in your variables indicating also the importance, it might >>> make sense not to do so. >>> You should think about the analysis, you cannot know which result is >>> `better' unless you know an interpretation. >>> >>> >>> >>>> I noticed that the proportion of the variance for the first pca >>>> without >>>> standardizing had a larger value . Is there a meaning to it ? Isnt >>>> this >>>> always the case? >>>> At last , if I am supposed to predict a variable ie weight should I >>>> drop >>>> the variable ie weight from my data matrix when I do principal >>>> component >>>> analysis ? >>> >>> >>> This sounds a bit like homework. If that is the case, please ask your >>> teacher rather than this list. >>> Anyway, it does not make sense to predict weight using a linear >>> combination (principle component) that contains weight, does it? >>> >>> Uwe Ligges >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > http://had.co.nz/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26471673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
so under which cases is it better to standardize the data matrix first ? also is PCA generally used to predict the response variable , should I keep that variable in my data matrix ? Uwe Ligges-3 wrote: > > masterinex wrote: >> >> >> Hi guys , >> >> Im trying to do principal component analysis in R . There is 2 ways of >> doing >> it , I believe. >> One is doing principal component analysis right away the other way is >> standardizing the matrix first using s = scale(m)and then apply >> principal >> component analysis. >> How do I tell what result is better ? What values in particular should i >> look at . I already managed to find the eigenvalues and eigenvectors , >> the >> proportion of variance for each eigenvector using both methods. >> > > Generally, it is better to standardize. But in some cases, e.g. for the > same units in your variables indicating also the importance, it might > make sense not to do so. > You should think about the analysis, you cannot know which result is > `better' unless you know an interpretation. > > > >> I noticed that the proportion of the variance for the first pca without >> standardizing had a larger value . Is there a meaning to it ? Isnt this >> always the case? >> At last , if I am supposed to predict a variable ie weight should I >> drop >> the variable ie weight from my data matrix when I do principal component >> analysis ? > > > This sounds a bit like homework. If that is the case, please ask your > teacher rather than this list. > Anyway, it does not make sense to predict weight using a linear > combination (principle component) that contains weight, does it? > > Uwe Ligges > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to tell if its better to standardize your data matrix first when you do principal
Hi guys , Im trying to do principal component analysis in R . There is 2 ways of doing it , I believe. One is doing principal component analysis right away the other way is standardizing the matrix first using s = scale(m)and then apply principal component analysis. How do I tell what result is better ? What values in particular should i look at . I already managed to find the eigenvalues and eigenvectors , the proportion of variance for each eigenvector using both methods. I noticed that the proportion of the variance for the first pca without standardizing had a larger value . Is there a meaning to it ? Isnt this always the case? At last , if I am supposed to predict a variable ie weight should I drop the variable ie weight from my data matrix when I do principal component analysis ? -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26462070.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.