Re: [scikit-learn] Should we standardize data before PCA?

James Melenkevitz via scikit-learn Sun, 27 May 2018 12:23:01 -0700

 And this you have likely seen already in 
Wikipedia:https://en.wikipedia.org/wiki/Principal_component_analysis"...PCA is 
mostly used as a tool in exploratory data analysis and for making predictive 
models. It's often used to visualize genetic distance and relatedness between 
populations. PCA can be done by eigenvalue decomposition of a data covariance 
(or correlation) matrix or singular value decomposition of a data matrix, 
usually after mean centering[clarification needed] (and normalizing or using 
Z-scores) the data matrix for each attribute.[4] The results of a PCA are 
usually discussed in terms of component scores, sometimes called factor scores 
(the transformed variable values corresponding to a particular data point), and 
loadings (the weight by which each standardized original variable should be 
multiplied to get the component score)..."


    On Saturday, May 26, 2018, 10:10:32 PM PDT, Shiheng Duan 
<shid...@ucdavis.edu> wrote:  
 
 Thanks. 
Do you mean that if feature one has a larger derivation than feature two, after 
zscore they will have the same weight? In that case, it is a bias, right? The 
feature one should be more important than feature two in the PCA. 
On Thu, May 24, 2018 at 5:09 PM, Michael Eickenberg 
<michael.eickenb...@gmail.com> wrote:

Hi,
that totally depends on the nature of your data and whether the standard 
deviation of individual feature axes/columns of your data carry some form of 
importance measure. Note that PCA will bias its loadings towards columns with 
large standard deviations all else being held equal (meaning that if you have 
zscored columns, and then you choose one column and multiply it by, say 1000, 
then that component will likely show up as your first component [if 1000 is 
comparable or large wrt the number of features you are using])
Does this help?Michael
On Thu, May 24, 2018 at 4:39 PM, Shiheng Duan <shid...@ucdavis.edu> wrote:

Hello all,
I wonder is it necessary or correct to do z score transformation before PCA? I 
didn't see any preprocessing for face image in the example of Faces recognition 
example using eigenfaces and SVMs, link:http://scikit-learn.org/s 
table/auto_examples/applicatio ns/plot_face_recognition.html# 
sphx-glr-auto-examples- applications-plot-face- recognition-py
I am doing on a similar dataset and got a weird result if I standardized data 
before PCA. The components figure will have a strong gradient and it doesn't 
make any sense. Any ideas about the reason? 
Thanks. 
______________________________ _________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailma n/listinfo/scikit-learn




______________________________ _________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/ mailman/listinfo/scikit-learn



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Should we standardize data before PCA?

Reply via email to