Oh, never mind my previous email, because while the components should be the 
same, the projection of the data points onto those components would still be 
affected by centering vs non-centering I guess.

Best,
Sebastian

> On Oct 16, 2017, at 3:25 PM, Sebastian Raschka <[email protected]> wrote:
> 
> Hi,
> 
> if you compute the principal components (i.e., eigendecomposition) from the 
> covariance matrix, it shouldn't matter whether the data is centered or not, 
> since the covariance matrix is computed as 
> 
> CovMat = \fact{1}{n} \sum_{i=1}^{n} (x_n - \bar{x}) (x_n - \bar{x})^T
> 
> where \bar{x} = vector of feature means
> 
> So, if you center the data prior to computing the covariance matrix, \bar{x} 
> is simply 0.
> 
> Best,
> Sebastian
> 
>> On Oct 16, 2017, at 2:27 PM, Ismael Lemhadri <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> @Andreas Muller: 
>> My references do not assume centering, e.g. 
>> http://ufldl.stanford.edu/wiki/index.php/PCA 
>> <http://ufldl.stanford.edu/wiki/index.php/PCA>
>> any reference?
>> 
>> 
>> 
>> On Mon, Oct 16, 2017 at 10:20 AM, <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Send scikit-learn mailing list submissions to
>>         [email protected] <mailto:[email protected]>
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://mail.python.org/mailman/listinfo/scikit-learn 
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>> or, via email, send a message with subject or body 'help' to
>>         [email protected] 
>> <mailto:[email protected]>
>> 
>> You can reach the person managing the list at
>>         [email protected] <mailto:[email protected]>
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of scikit-learn digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Re: unclear help file for sklearn.decomposition.pca
>>       (Andreas Mueller)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Mon, 16 Oct 2017 13:19:57 -0400
>> From: Andreas Mueller <[email protected] <mailto:[email protected]>>
>> To: [email protected] <mailto:[email protected]>
>> Subject: Re: [scikit-learn] unclear help file for
>>         sklearn.decomposition.pca
>> Message-ID: <[email protected] 
>> <mailto:[email protected]>>
>> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>> 
>> The definition of PCA has a centering step, but no scaling step.
>> 
>> On 10/16/2017 11:16 AM, Ismael Lemhadri wrote:
>> > Dear Roman,
>> > My concern is actually not about not mentioning the scaling but about
>> > not mentioning the centering.
>> > That is, the sklearn PCA removes the mean but it does not mention it
>> > in the help file.
>> > This was quite messy for me to debug as I expected it to either: 1/
>> > center and scale simultaneously or / not scale and not center either.
>> > It would be beneficial to explicit the behavior in the help file in my
>> > opinion.
>> > Ismael
>> >
>> > On Mon, Oct 16, 2017 at 8:02 AM, <[email protected] 
>> > <mailto:[email protected]>
>> > <mailto:[email protected] 
>> > <mailto:[email protected]>>> wrote:
>> >
>> >     Send scikit-learn mailing list submissions to
>> >     [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >
>> >     To subscribe or unsubscribe via the World Wide Web, visit
>> >     https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >     or, via email, send a message with subject or body 'help' to
>> >     [email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] 
>> > <mailto:[email protected]>>
>> >
>> >     You can reach the person managing the list at
>> >     [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] 
>> > <mailto:[email protected]>>
>> >
>> >     When replying, please edit your Subject line so it is more specific
>> >     than "Re: Contents of scikit-learn digest..."
>> >
>> >
>> >     Today's Topics:
>> >
>> >     ? ?1. unclear help file for sklearn.decomposition.pca (Ismael
>> >     Lemhadri)
>> >     ? ?2. Re: unclear help file for sklearn.decomposition.pca
>> >     ? ? ? (Roman Yurchak)
>> >     ? ?3. Question about LDA's coef_ attribute (Serafeim Loukas)
>> >     ? ?4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)
>> >     ? ?5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)
>> >
>> >
>> >     ----------------------------------------------------------------------
>> >
>> >     Message: 1
>> >     Date: Sun, 15 Oct 2017 18:42:56 -0700
>> >     From: Ismael Lemhadri <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> >     To: [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     Subject: [scikit-learn] unclear help file for
>> >     ? ? ? ? sklearn.decomposition.pca
>> >     Message-ID:
>> >     ? ? ? ?
>> >     <CANpSPFTgv+Oz7f97dandmrBBayqf_o9w=18okhcfn0u5dnz...@mail.gmail.com 
>> > <mailto:18okhcfn0u5dnzj%[email protected]>
>> >     <mailto:18okhcfn0u5dnzj%[email protected] 
>> > <mailto:18okhcfn0u5dnzj%[email protected]>>>
>> >     Content-Type: text/plain; charset="utf-8"
>> >
>> >     Dear all,
>> >     The help file for the PCA class is unclear about the preprocessing
>> >     performed to the data.
>> >     You can check on line 410 here:
>> >     https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/ 
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/>
>> >     decomposition/pca.py#L410
>> >     
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410
>> >  
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410>>
>> >     that the matrix is centered but NOT scaled, before performing the
>> >     singular
>> >     value decomposition.
>> >     However, the help files do not make any mention of it.
>> >     This is unclear for someone who, like me, just wanted to compare
>> >     that the
>> >     PCA and np.linalg.svd give the same results. In academic settings,
>> >     students
>> >     are often asked to compare different methods and to check that
>> >     they yield
>> >     the same results. I expect that many students have confronted this
>> >     problem
>> >     before...
>> >     Best,
>> >     Ismael Lemhadri
>> >     -------------- next part --------------
>> >     An HTML attachment was scrubbed...
>> >     URL:
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>>>
>> >
>> >     ------------------------------
>> >
>> >     Message: 2
>> >     Date: Mon, 16 Oct 2017 15:16:45 +0200
>> >     From: Roman Yurchak <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> >     To: Scikit-learn mailing list <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> >     Subject: Re: [scikit-learn] unclear help file for
>> >     ? ? ? ? sklearn.decomposition.pca
>> >     Message-ID: <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] 
>> > <mailto:[email protected]>>>
>> >     Content-Type: text/plain; charset=utf-8; format=flowed
>> >
>> >     Ismael,
>> >
>> >     as far as I saw the sklearn.decomposition.PCA doesn't mention
>> >     scaling at
>> >     all (except for the whiten parameter which is post-transformation
>> >     scaling).
>> >
>> >     So since it doesn't mention it, it makes sense that it doesn't do any
>> >     scaling of the input. Same as np.linalg.svd.
>> >
>> >     You can verify that PCA and np.linalg.svd yield the same results, with
>> >
>> >     ```
>> >     ?>>> import numpy as np
>> >     ?>>> from sklearn.decomposition import PCA
>> >     ?>>> import numpy.linalg
>> >     ?>>> X = np.random.RandomState(42).rand(10, 4)
>> >     ?>>> n_components = 2
>> >     ?>>> PCA(n_components, svd_solver='full').fit_transform(X)
>> >     ```
>> >
>> >     and
>> >
>> >     ```
>> >     ?>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
>> >     ?>>> (X - X.mean(axis=0)).dot(V[:n_components].T)
>> >     ```
>> >
>> >     --
>> >     Roman
>> >
>> >     On 16/10/17 03:42, Ismael Lemhadri wrote:
>> >     > Dear all,
>> >     > The help file for the PCA class is unclear about the preprocessing
>> >     > performed to the data.
>> >     > You can check on line 410 here:
>> >     >
>> >     
>> > https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
>> >  
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
>> >     
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
>> >  
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>
>> >     >
>> >     
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
>> >  
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
>> >     
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
>> >  
>> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>>
>> >     > that the matrix is centered but NOT scaled, before performing the
>> >     > singular value decomposition.
>> >     > However, the help files do not make any mention of it.
>> >     > This is unclear for someone who, like me, just wanted to compare
>> >     that
>> >     > the PCA and np.linalg.svd give the same results. In academic
>> >     settings,
>> >     > students are often asked to compare different methods and to
>> >     check that
>> >     > they yield the same results. I expect that many students have
>> >     confronted
>> >     > this problem before...
>> >     > Best,
>> >     > Ismael Lemhadri
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > scikit-learn mailing list
>> >     > [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     > https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >     >
>> >
>> >
>> >
>> >     ------------------------------
>> >
>> >     Message: 3
>> >     Date: Mon, 16 Oct 2017 15:27:48 +0200
>> >     From: Serafeim Loukas <[email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>>
>> >     To: [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     Subject: [scikit-learn] Question about LDA's coef_ attribute
>> >     Message-ID: <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] 
>> > <mailto:[email protected]>>>
>> >     Content-Type: text/plain; charset="us-ascii"
>> >
>> >     Dear Scikit-learn community,
>> >
>> >     Since the documentation of the LDA
>> >     
>> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
>> >     
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>
>> >     
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
>> >     
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>>)
>> >     is not so clear, I would like to ask if the lda.coef_ attribute
>> >     stores the eigenvectors from the SVD decomposition.
>> >
>> >     Thank you in advance,
>> >     Serafeim
>> >     -------------- next part --------------
>> >     An HTML attachment was scrubbed...
>> >     URL:
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>>>
>> >
>> >     ------------------------------
>> >
>> >     Message: 4
>> >     Date: Mon, 16 Oct 2017 16:57:52 +0200
>> >     From: Alexandre Gramfort <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] 
>> > <mailto:[email protected]>>>
>> >     To: Scikit-learn mailing list <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> >     Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
>> >     Message-ID:
>> >     ? ? ? ?
>> >     <cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com 
>> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>
>> >     
>> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com 
>> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>>>
>> >     Content-Type: text/plain; charset="UTF-8"
>> >
>> >     no it stores the direction of the decision function to match the
>> >     API of
>> >     linear models.
>> >
>> >     HTH
>> >     Alex
>> >
>> >     On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
>> >     <[email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>> >     > Dear Scikit-learn community,
>> >     >
>> >     > Since the documentation of the LDA
>> >     >
>> >     
>> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
>> >     
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
>> >     > is not so clear, I would like to ask if the lda.coef_ attribute
>> >     stores the
>> >     > eigenvectors from the SVD decomposition.
>> >     >
>> >     > Thank you in advance,
>> >     > Serafeim
>> >     >
>> >     > _______________________________________________
>> >     > scikit-learn mailing list
>> >     > [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     > https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >     >
>> >
>> >
>> >     ------------------------------
>> >
>> >     Message: 5
>> >     Date: Mon, 16 Oct 2017 17:02:46 +0200
>> >     From: Serafeim Loukas <[email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>>
>> >     To: Scikit-learn mailing list <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> >     Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
>> >     Message-ID: <[email protected] 
>> > <mailto:[email protected]>
>> >     <mailto:[email protected] 
>> > <mailto:[email protected]>>>
>> >     Content-Type: text/plain; charset="us-ascii"
>> >
>> >     Dear Alex,
>> >
>> >     Thank you for the prompt response.
>> >
>> >     Are the eigenvectors stored in some variable ?
>> >     Does the lda.scalings_ attribute contain the eigenvectors ?
>> >
>> >     Best,
>> >     Serafeim
>> >
>> >     > On 16 Oct 2017, at 16:57, Alexandre Gramfort
>> >     <[email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>>
>> >     wrote:
>> >     >
>> >     > no it stores the direction of the decision function to match the
>> >     API of
>> >     > linear models.
>> >     >
>> >     > HTH
>> >     > Alex
>> >     >
>> >     > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
>> >     <[email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>> >     >> Dear Scikit-learn community,
>> >     >>
>> >     >> Since the documentation of the LDA
>> >     >>
>> >     
>> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
>> >     
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
>> >  
>> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
>> >     >> is not so clear, I would like to ask if the lda.coef_ attribute
>> >     stores the
>> >     >> eigenvectors from the SVD decomposition.
>> >     >>
>> >     >> Thank you in advance,
>> >     >> Serafeim
>> >     >>
>> >     >> _______________________________________________
>> >     >> scikit-learn mailing list
>> >     >> [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     >> https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >     >>
>> >     > _______________________________________________
>> >     > scikit-learn mailing list
>> >     > [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     > https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >
>> >     -------------- next part --------------
>> >     An HTML attachment was scrubbed...
>> >     URL:
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>
>> >     
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html
>> >  
>> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>>>
>> >
>> >     ------------------------------
>> >
>> >     Subject: Digest Footer
>> >
>> >     _______________________________________________
>> >     scikit-learn mailing list
>> >     [email protected] <mailto:[email protected]> 
>> > <mailto:[email protected] <mailto:[email protected]>>
>> >     https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> >     <https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
>> >
>> >
>> >     ------------------------------
>> >
>> >     End of scikit-learn Digest, Vol 19, Issue 25
>> >     ********************************************
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > scikit-learn mailing list
>> > [email protected] <mailto:[email protected]>
>> > https://mail.python.org/mailman/listinfo/scikit-learn 
>> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html
>>  
>> <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html>>
>> 
>> ------------------------------
>> 
>> Subject: Digest Footer
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected] <mailto:[email protected]>
>> https://mail.python.org/mailman/listinfo/scikit-learn 
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>> 
>> 
>> ------------------------------
>> 
>> End of scikit-learn Digest, Vol 19, Issue 28
>> ********************************************
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected] <mailto:[email protected]>
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to