Re: [R] how to tell if its better to standardize your data matrix first when you do principal
Actually Its for an assignment Michael , all Im looking is some help and suggestions , please dont get it wrong , and I do believe that this is a helpful community . > > This sounds a bit like homework. If that is the case, please ask your > teacher rather than this list. > Anyway, it does not make sense to predict weight using a linear > combination (principle component) that contains weight, does it? > > Uwe Ligges It's likely to have been homework: A quick search on "masterinex" "xevilgang79" reveal which university this undergraduate student is at. It also produces a phone number, which can be used to lookup an address, and a cell phone number. MK __ R-help@r-project.org mailing list PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26490273.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
On Nov 22, 2009, at 10:22 AM, Uwe Ligges wrote: > masterinex wrote: >> Hi guys , Im trying to do principal component analysis in R . There is 2 >> ways of doing >> it , I believe. One is doing principal component analysis right away the >> other way is standardizing the matrix first using s = scale(m)and then >> apply principal >> component analysis. How do I tell what result is better ? What values in >> particular should i >> look at . I already managed to find the eigenvalues and eigenvectors , the >> proportion of variance for each eigenvector using both methods. > > Generally, it is better to standardize. But in some cases, e.g. for the same > units in your variables indicating also the importance, it might make sense > not to do so. > You should think about the analysis, you cannot know which result is `better' > unless you know an interpretation. > > > >> I noticed that the proportion of the variance for the first pca without >> standardizing had a larger value . Is there a meaning to it ? Isnt this >> always the case? >> At last , if I am supposed to predict a variable ie weight should I drop >> the variable ie weight from my data matrix when I do principal component >> analysis ? > > > This sounds a bit like homework. If that is the case, please ask your teacher > rather than this list. > Anyway, it does not make sense to predict weight using a linear combination > (principle component) that contains weight, does it? > > Uwe Ligges It's likely to have been homework: A quick search on "masterinex" "xevilgang79" reveal which university this undergraduate student is at. It also produces a phone number, which can be used to lookup an address, and a cell phone number. MK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
masterinex wrote: this is how my data matrix looks like . This is just for the first 10 observations , but the pattern is similar for the other observations. 112.3 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 2 6.1 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 325.3 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 410.4 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 528.7 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 620.9 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 719.2 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7 812.4 176.00 72.50 37.8 99.6 88.5 97.1 60.0 39.4 23.2 30.5 29.0 18.8 9 4.1 191.00 74.00 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2 10 11.7 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6 30.0 19.2 and after standardizing it . 1 -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055 -0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945 2 -1.588060506 -0.185928394 0.75868364 0.23560461 -0.889886435 -0.931523054 -0.155497233 -0.1252522485 -0.53901713 0.295114747 -0.59529632 30.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906 -0.444727135 -0.077038289 0.0628841989 0.15515266 0.743320270 -1.17652277 4 -1.063161122 0.245595958 0.75868364 -0.24734870 0.133598535 -0.593746294 0.236797489 0.1674044475 -0.53901713 -0.153090775 0.05430971 51.170713001 0.226834030 0.37157577 -1.56449410 -0.428070046 0.757360745 0.346640011 0.8154299886 1.58687786 0.743320270 -0.01406987 60.218569932 1.202454304 1.72645331 0.45512884 0.470599683 0.201022552 1.27244 1.4007433805 1.50010664 1.938534997 1.18257281 70.011051571 0.104881496 -0.20908604 -0.68639717 0.545488828 -0.166558039 0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925 8 -0.819021874 -0.082737788 0.85546060 -0.07172932 -0.140994994 -0.385119472 -0.406565855 0.1465003978 0.37208072 0.145712907 -0.59529632 9 -1.832199755 0.480120063 1.43612241 0.05998522 0.021264819 -0.981196107 0.032804234 0.7527178395 -0.10516101 0.593918429 1.25095239 10 -0.904470611 0.752168024 1.24256848 1.81617909 -0.140994994 -0.375184861 0.691859366 0.7945259389 1.36994980 1.490329474 1.14838302 this is the result of applying PCA to the data matrix Standard deviations: [1] 30.6645414 7.5513852 3.6927427 2.8703435 2.5363007 1.9136933 1.5624131 1.3689630 1.2976189 [10] 1.1633458 1.1118231 0.7847148 0.4802303 Rotation: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 var1 0.18110712 -0.74864138 -0.46070566 -0.365658769 0.192810075 -0.132529979 0.023764851 0.03674873 var2 0.86458284 0.34243386 -0.05766909 -0.235504989 -0.046075934 0.001493006 -0.024535011 0.13439659 var3 0.03765598 0.20097537 -0.15709612 -0.343218776 -0.295201121 -0.073295697 -0.086930370 -0.54389141 var40.05965733 0.01737951 0.09854179 -0.030801791 0.125735684 0.341795876 -0.001735808 0.37152696 var5 0.23845698 -0.20616399 0.68948870 0.025904812 0.391188182 -0.428933369 -0.101780281 -0.16965893 var6 0.29928369 -0.47394636 0.24791449 0.341235161 -0.511378719 0.447071255 -0.077534385 -0.13198544 var7 0.19503685 0.01385823 -0.24126047 0.531403827 -0.127426510 -0.410568454 0.608163973 -0.01265457 var8 0.13261863 0.06839078 -0.37740589 0.535332339 0.366103479 0.032376851 -0.574484605 -0.05645694 var90.06246705 0.04407384 -0.09545362 0.037993146 -0.036651080 0.012347288 -0.192976142 -0.13027876 var10 0.03027791 0.05533988 -0.03749859 -0.009257423 0.011026593 -0.010770032 -0.104041067 0.12125263 var11 0.07435322 0.04334969 -0.02666944 0.032036374 0.464035624 0.454970952 0.347507539 -0.60527541 var12 0.04328710 0.04731771 0.00360668 -0.054200633 0.275901346 0.297800123 0.324323749 0.30487145 var13 0.02095652 0.02146485 0.03598618 -0.022510780 0.005192075 0.103988977 0.031541374 0.07877455 PC9 PC10 PC11PC12 PC13 var1 -0.005328345 0.030549780 -0.049283616 -0.02211988 0.015660892 var2 0.170766596 -0.144031738 0.028862963 0.06984674 0.006293703 var3 -0.282549313 0.548650592 0.131284937 -0.14740722 -0.002384605 var4 0.024070488 0.614154008 -0.551480394 -0.03446124 -0.178123011 var5 -0.157551008 0.147685248 0.008044148 -0.04068258 0.007778992 var6 -0.058675551 0.006344813 0.130814072 -0.04088919 -0.028655330 var7 -0.099243751 0.171852216 -0.149231752 -0.06690208 -0.014693444 var80.006629025 0.199158097 0.187226774 -0.02511968 0.070896819 var9-0.658214712 -0.320120384 -0.53990 0.37630539 -0.023642902 var10 -0.259704149 -0.273030750 -0.074006053 -0.83676032
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
masterinex wrote: Hi Hadley , I really apreciate the suggestions you gave, It was helpful , but I still didnt quite get it all. and I really want to do a good job , so any comments would sure come helpful, please understand me . Well, we try to understand you, but we do not either. I think you really nedc to consult some statistics textbook on PCA if my answer was not sufficient. Given your questions, I doubt you understand what PCA does and how it works. It does not predict anything. Uwe Ligges hadley wrote: You've asked the same question on stackoverflow.com and received the same answer. This is rude because it duplicates effort. If you urgently need a response to a question, perhaps you should consider paying for it. Hadley On Sun, Nov 22, 2009 at 12:04 PM, masterinex wrote: so under which cases is it better to standardize the data matrix first ? also is PCA generally used to predict the response variable , should I keep that variable in my data matrix ? Uwe Ligges-3 wrote: masterinex wrote: Hi guys , Im trying to do principal component analysis in R . There is 2 ways of doing it , I believe. One is doing principal component analysis right away the other way is standardizing the matrix first using s = scale(m)and then apply principal component analysis. How do I tell what result is better ? What values in particular should i look at . I already managed to find the eigenvalues and eigenvectors , the proportion of variance for each eigenvector using both methods. Generally, it is better to standardize. But in some cases, e.g. for the same units in your variables indicating also the importance, it might make sense not to do so. You should think about the analysis, you cannot know which result is `better' unless you know an interpretation. I noticed that the proportion of the variance for the first pca without standardizing had a larger value . Is there a meaning to it ? Isnt this always the case? At last , if I am supposed to predict a variable ie weight should I drop the variable ie weight from my data matrix when I do principal component analysis ? This sounds a bit like homework. If that is the case, please ask your teacher rather than this list. Anyway, it does not make sense to predict weight using a linear combination (principle component) that contains weight, does it? Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
this is how my data matrix looks like . This is just for the first 10 observations , but the pattern is similar for the other observations. 112.3 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 2 6.1 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 325.3 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 410.4 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 528.7 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 620.9 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 719.2 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7 812.4 176.00 72.50 37.8 99.6 88.5 97.1 60.0 39.4 23.2 30.5 29.0 18.8 9 4.1 191.00 74.00 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2 10 11.7 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6 30.0 19.2 and after standardizing it . 1 -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055 -0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945 2 -1.588060506 -0.185928394 0.75868364 0.23560461 -0.889886435 -0.931523054 -0.155497233 -0.1252522485 -0.53901713 0.295114747 -0.59529632 30.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906 -0.444727135 -0.077038289 0.0628841989 0.15515266 0.743320270 -1.17652277 4 -1.063161122 0.245595958 0.75868364 -0.24734870 0.133598535 -0.593746294 0.236797489 0.1674044475 -0.53901713 -0.153090775 0.05430971 51.170713001 0.226834030 0.37157577 -1.56449410 -0.428070046 0.757360745 0.346640011 0.8154299886 1.58687786 0.743320270 -0.01406987 60.218569932 1.202454304 1.72645331 0.45512884 0.470599683 0.201022552 1.27244 1.4007433805 1.50010664 1.938534997 1.18257281 70.011051571 0.104881496 -0.20908604 -0.68639717 0.545488828 -0.166558039 0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925 8 -0.819021874 -0.082737788 0.85546060 -0.07172932 -0.140994994 -0.385119472 -0.406565855 0.1465003978 0.37208072 0.145712907 -0.59529632 9 -1.832199755 0.480120063 1.43612241 0.05998522 0.021264819 -0.981196107 0.032804234 0.7527178395 -0.10516101 0.593918429 1.25095239 10 -0.904470611 0.752168024 1.24256848 1.81617909 -0.140994994 -0.375184861 0.691859366 0.7945259389 1.36994980 1.490329474 1.14838302 this is the result of applying PCA to the data matrix Standard deviations: [1] 30.6645414 7.5513852 3.6927427 2.8703435 2.5363007 1.9136933 1.5624131 1.3689630 1.2976189 [10] 1.1633458 1.1118231 0.7847148 0.4802303 Rotation: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 var1 0.18110712 -0.74864138 -0.46070566 -0.365658769 0.192810075 -0.132529979 0.023764851 0.03674873 var2 0.86458284 0.34243386 -0.05766909 -0.235504989 -0.046075934 0.001493006 -0.024535011 0.13439659 var3 0.03765598 0.20097537 -0.15709612 -0.343218776 -0.295201121 -0.073295697 -0.086930370 -0.54389141 var40.05965733 0.01737951 0.09854179 -0.030801791 0.125735684 0.341795876 -0.001735808 0.37152696 var5 0.23845698 -0.20616399 0.68948870 0.025904812 0.391188182 -0.428933369 -0.101780281 -0.16965893 var6 0.29928369 -0.47394636 0.24791449 0.341235161 -0.511378719 0.447071255 -0.077534385 -0.13198544 var7 0.19503685 0.01385823 -0.24126047 0.531403827 -0.127426510 -0.410568454 0.608163973 -0.01265457 var8 0.13261863 0.06839078 -0.37740589 0.535332339 0.366103479 0.032376851 -0.574484605 -0.05645694 var90.06246705 0.04407384 -0.09545362 0.037993146 -0.036651080 0.012347288 -0.192976142 -0.13027876 var10 0.03027791 0.05533988 -0.03749859 -0.009257423 0.011026593 -0.010770032 -0.104041067 0.12125263 var11 0.07435322 0.04334969 -0.02666944 0.032036374 0.464035624 0.454970952 0.347507539 -0.60527541 var12 0.04328710 0.04731771 0.00360668 -0.054200633 0.275901346 0.297800123 0.324323749 0.30487145 var13 0.02095652 0.02146485 0.03598618 -0.022510780 0.005192075 0.103988977 0.031541374 0.07877455 PC9 PC10 PC11PC12 PC13 var1 -0.005328345 0.030549780 -0.049283616 -0.02211988 0.015660892 var2 0.170766596 -0.144031738 0.028862963 0.06984674 0.006293703 var3 -0.282549313 0.548650592 0.131284937 -0.14740722 -0.002384605 var4 0.024070488 0.614154008 -0.551480394 -0.03446124 -0.178123011 var5 -0.157551008 0.147685248 0.008044148 -0.04068258 0.007778992 var6 -0.058675551 0.006344813 0.130814072 -0.04088919 -0.028655330 var7 -0.099243751 0.171852216 -0.149231752 -0.06690208 -0.014693444 var80.006629025 0.199158097 0.187226774 -0.02511968 0.070896819 var9-0.658214712 -0.320120384 -0.53990 0.37630539 -0.023642902 var10 -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215 var11 0.157450716
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
Hi Hadley , I really apreciate the suggestions you gave, It was helpful , but I still didnt quite get it all. and I really want to do a good job , so any comments would sure come helpful, please understand me . hadley wrote: > > You've asked the same question on stackoverflow.com and received the > same answer. This is rude because it duplicates effort. If you > urgently need a response to a question, perhaps you should consider > paying for it. > > Hadley > > On Sun, Nov 22, 2009 at 12:04 PM, masterinex > wrote: >> >> so under which cases is it better to standardize the data matrix first >> ? >> also is PCA generally used to predict the response variable , should I >> keep that variable in my data matrix ? >> >> >> Uwe Ligges-3 wrote: >>> >>> masterinex wrote: Hi guys , Im trying to do principal component analysis in R . There is 2 ways of doing it , I believe. One is doing principal component analysis right away the other way is standardizing the matrix first using s = scale(m)and then apply principal component analysis. How do I tell what result is better ? What values in particular should i look at . I already managed to find the eigenvalues and eigenvectors , the proportion of variance for each eigenvector using both methods. >>> >>> Generally, it is better to standardize. But in some cases, e.g. for the >>> same units in your variables indicating also the importance, it might >>> make sense not to do so. >>> You should think about the analysis, you cannot know which result is >>> `better' unless you know an interpretation. >>> >>> >>> I noticed that the proportion of the variance for the first pca without standardizing had a larger value . Is there a meaning to it ? Isnt this always the case? At last , if I am supposed to predict a variable ie weight should I drop the variable ie weight from my data matrix when I do principal component analysis ? >>> >>> >>> This sounds a bit like homework. If that is the case, please ask your >>> teacher rather than this list. >>> Anyway, it does not make sense to predict weight using a linear >>> combination (principle component) that contains weight, does it? >>> >>> Uwe Ligges >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > http://had.co.nz/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26471673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
You've asked the same question on stackoverflow.com and received the same answer. This is rude because it duplicates effort. If you urgently need a response to a question, perhaps you should consider paying for it. Hadley On Sun, Nov 22, 2009 at 12:04 PM, masterinex wrote: > > so under which cases is it better to standardize the data matrix first ? > also is PCA generally used to predict the response variable , should I > keep that variable in my data matrix ? > > > Uwe Ligges-3 wrote: >> >> masterinex wrote: >>> >>> >>> Hi guys , >>> >>> Im trying to do principal component analysis in R . There is 2 ways of >>> doing >>> it , I believe. >>> One is doing principal component analysis right away the other way is >>> standardizing the matrix first using s = scale(m)and then apply >>> principal >>> component analysis. >>> How do I tell what result is better ? What values in particular should i >>> look at . I already managed to find the eigenvalues and eigenvectors , >>> the >>> proportion of variance for each eigenvector using both methods. >>> >> >> Generally, it is better to standardize. But in some cases, e.g. for the >> same units in your variables indicating also the importance, it might >> make sense not to do so. >> You should think about the analysis, you cannot know which result is >> `better' unless you know an interpretation. >> >> >> >>> I noticed that the proportion of the variance for the first pca without >>> standardizing had a larger value . Is there a meaning to it ? Isnt this >>> always the case? >>> At last , if I am supposed to predict a variable ie weight should I >>> drop >>> the variable ie weight from my data matrix when I do principal component >>> analysis ? >> >> >> This sounds a bit like homework. If that is the case, please ask your >> teacher rather than this list. >> Anyway, it does not make sense to predict weight using a linear >> combination (principle component) that contains weight, does it? >> >> Uwe Ligges >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
so under which cases is it better to standardize the data matrix first ? also is PCA generally used to predict the response variable , should I keep that variable in my data matrix ? Uwe Ligges-3 wrote: > > masterinex wrote: >> >> >> Hi guys , >> >> Im trying to do principal component analysis in R . There is 2 ways of >> doing >> it , I believe. >> One is doing principal component analysis right away the other way is >> standardizing the matrix first using s = scale(m)and then apply >> principal >> component analysis. >> How do I tell what result is better ? What values in particular should i >> look at . I already managed to find the eigenvalues and eigenvectors , >> the >> proportion of variance for each eigenvector using both methods. >> > > Generally, it is better to standardize. But in some cases, e.g. for the > same units in your variables indicating also the importance, it might > make sense not to do so. > You should think about the analysis, you cannot know which result is > `better' unless you know an interpretation. > > > >> I noticed that the proportion of the variance for the first pca without >> standardizing had a larger value . Is there a meaning to it ? Isnt this >> always the case? >> At last , if I am supposed to predict a variable ie weight should I >> drop >> the variable ie weight from my data matrix when I do principal component >> analysis ? > > > This sounds a bit like homework. If that is the case, please ask your > teacher rather than this list. > Anyway, it does not make sense to predict weight using a linear > combination (principle component) that contains weight, does it? > > Uwe Ligges > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to tell if its better to standardize your data matrix first when you do principal
masterinex wrote: Hi guys , Im trying to do principal component analysis in R . There is 2 ways of doing it , I believe. One is doing principal component analysis right away the other way is standardizing the matrix first using s = scale(m)and then apply principal component analysis. How do I tell what result is better ? What values in particular should i look at . I already managed to find the eigenvalues and eigenvectors , the proportion of variance for each eigenvector using both methods. Generally, it is better to standardize. But in some cases, e.g. for the same units in your variables indicating also the importance, it might make sense not to do so. You should think about the analysis, you cannot know which result is `better' unless you know an interpretation. I noticed that the proportion of the variance for the first pca without standardizing had a larger value . Is there a meaning to it ? Isnt this always the case? At last , if I am supposed to predict a variable ie weight should I drop the variable ie weight from my data matrix when I do principal component analysis ? This sounds a bit like homework. If that is the case, please ask your teacher rather than this list. Anyway, it does not make sense to predict weight using a linear combination (principle component) that contains weight, does it? Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to tell if its better to standardize your data matrix first when you do principal
Hi guys , Im trying to do principal component analysis in R . There is 2 ways of doing it , I believe. One is doing principal component analysis right away the other way is standardizing the matrix first using s = scale(m)and then apply principal component analysis. How do I tell what result is better ? What values in particular should i look at . I already managed to find the eigenvalues and eigenvectors , the proportion of variance for each eigenvector using both methods. I noticed that the proportion of the variance for the first pca without standardizing had a larger value . Is there a meaning to it ? Isnt this always the case? At last , if I am supposed to predict a variable ie weight should I drop the variable ie weight from my data matrix when I do principal component analysis ? -- View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26462070.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.