[ 
https://issues.apache.org/jira/browse/SYSTEMML-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederick Reiss updated SYSTEMML-1146:
--------------------------------------
    Assignee: Prithviraj Sen

> Improve PCA description in documentation
> ----------------------------------------
>
>                 Key: SYSTEMML-1146
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1146
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Deron Eriksson
>            Assignee: Prithviraj Sen
>            Priority: Minor
>
> David P. Nichols reports that the first sentence of the PCA description in 
> the Algorithms Reference is inaccurate 
> (http://apache.github.io/incubator-systemml/algorithms-matrix-factorization.html#principal-component-analysis).
> "Principal Component Analysis (PCA) is a simple, non-parametric method to 
> transform the given data set with possibly correlated columns into a set of 
> linearly uncorrelated or orthogonal columns, called principal components." 
> The problem with this statement is that principal component scores typically 
> will not be uncorrelated unless the input data have been centered (or began 
> with means of 0). Orthogonal and uncorrelated are not the same thing. Whether 
> or not two vectors are orthogonal is a function of the raw values, while 
> covariance and hence correlation are functions of the centered values. 
> It looks like the text was taken from Wikipedia's Principal component 
> analysis entry. Whoever wrote that part of that entry seems to be assuming 
> that principal components analysis always involves working on a matrix of 
> centered (or centered and scaled) data, but that is not always the case. The 
> default in SystemML is not to center input columns, so typically resulting 
> data columns will not be uncorrelated, though they will be orthogonal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to