[ https://issues.apache.org/jira/browse/SYSTEMML-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frederick Reiss updated SYSTEMML-1146: -------------------------------------- Assignee: Prithviraj Sen > Improve PCA description in documentation > ---------------------------------------- > > Key: SYSTEMML-1146 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1146 > Project: SystemML > Issue Type: Improvement > Components: Documentation > Reporter: Deron Eriksson > Assignee: Prithviraj Sen > Priority: Minor > > David P. Nichols reports that the first sentence of the PCA description in > the Algorithms Reference is inaccurate > (http://apache.github.io/incubator-systemml/algorithms-matrix-factorization.html#principal-component-analysis). > "Principal Component Analysis (PCA) is a simple, non-parametric method to > transform the given data set with possibly correlated columns into a set of > linearly uncorrelated or orthogonal columns, called principal components." > The problem with this statement is that principal component scores typically > will not be uncorrelated unless the input data have been centered (or began > with means of 0). Orthogonal and uncorrelated are not the same thing. Whether > or not two vectors are orthogonal is a function of the raw values, while > covariance and hence correlation are functions of the centered values. > It looks like the text was taken from Wikipedia's Principal component > analysis entry. Whoever wrote that part of that entry seems to be assuming > that principal components analysis always involves working on a matrix of > centered (or centered and scaled) data, but that is not always the case. The > default in SystemML is not to center input columns, so typically resulting > data columns will not be uncorrelated, though they will be orthogonal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)