Ismael,

as far as I saw the sklearn.decomposition.PCA doesn't mention scaling at all (except for the whiten parameter which is post-transformation scaling).

So since it doesn't mention it, it makes sense that it doesn't do any scaling of the input. Same as np.linalg.svd.

You can verify that PCA and np.linalg.svd yield the same results, with

```
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> import numpy.linalg
>>> X = np.random.RandomState(42).rand(10, 4)
>>> n_components = 2
>>> PCA(n_components, svd_solver='full').fit_transform(X)
```

and

```
>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
>>> (X - X.mean(axis=0)).dot(V[:n_components].T)
```

--
Roman

On 16/10/17 03:42, Ismael Lemhadri wrote:
Dear all,
The help file for the PCA class is unclear about the preprocessing
performed to the data.
You can check on line 410 here:
https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
that the matrix is centered but NOT scaled, before performing the
singular value decomposition.
However, the help files do not make any mention of it.
This is unclear for someone who, like me, just wanted to compare that
the PCA and np.linalg.svd give the same results. In academic settings,
students are often asked to compare different methods and to check that
they yield the same results. I expect that many students have confronted
this problem before...
Best,
Ismael Lemhadri


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to