The definition of PCA has a centering step, but no scaling step.

On 10/16/2017 11:16 AM, Ismael Lemhadri wrote:
Dear Roman,
My concern is actually not about not mentioning the scaling but about not mentioning the centering. That is, the sklearn PCA removes the mean but it does not mention it in the help file. This was quite messy for me to debug as I expected it to either: 1/ center and scale simultaneously or / not scale and not center either. It would be beneficial to explicit the behavior in the help file in my opinion.
Ismael

On Mon, Oct 16, 2017 at 8:02 AM, <scikit-learn-requ...@python.org <mailto:scikit-learn-requ...@python.org>> wrote:

    Send scikit-learn mailing list submissions to
    scikit-learn@python.org <mailto:scikit-learn@python.org>

    To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    or, via email, send a message with subject or body 'help' to
    scikit-learn-requ...@python.org
    <mailto:scikit-learn-requ...@python.org>

    You can reach the person managing the list at
    scikit-learn-ow...@python.org <mailto:scikit-learn-ow...@python.org>

    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of scikit-learn digest..."


    Today's Topics:

       1. unclear help file for sklearn.decomposition.pca (Ismael
    Lemhadri)
       2. Re: unclear help file for sklearn.decomposition.pca
          (Roman Yurchak)
       3. Question about LDA's coef_ attribute (Serafeim Loukas)
       4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)
       5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)


    ----------------------------------------------------------------------

    Message: 1
    Date: Sun, 15 Oct 2017 18:42:56 -0700
    From: Ismael Lemhadri <lemha...@stanford.edu
    <mailto:lemha...@stanford.edu>>
    To: scikit-learn@python.org <mailto:scikit-learn@python.org>
    Subject: [scikit-learn] unclear help file for
            sklearn.decomposition.pca
    Message-ID:
           
    <CANpSPFTgv+Oz7f97dandmrBBayqf_o9w=18okhcfn0u5dnz...@mail.gmail.com
    <mailto:18okhcfn0u5dnzj%...@mail.gmail.com>>
    Content-Type: text/plain; charset="utf-8"

    Dear all,
    The help file for the PCA class is unclear about the preprocessing
    performed to the data.
    You can check on line 410 here:
    https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
    decomposition/pca.py#L410
    
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410>
    that the matrix is centered but NOT scaled, before performing the
    singular
    value decomposition.
    However, the help files do not make any mention of it.
    This is unclear for someone who, like me, just wanted to compare
    that the
    PCA and np.linalg.svd give the same results. In academic settings,
    students
    are often asked to compare different methods and to check that
    they yield
    the same results. I expect that many students have confronted this
    problem
    before...
    Best,
    Ismael Lemhadri
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL:
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>>

    ------------------------------

    Message: 2
    Date: Mon, 16 Oct 2017 15:16:45 +0200
    From: Roman Yurchak <rth.yurc...@gmail.com
    <mailto:rth.yurc...@gmail.com>>
    To: Scikit-learn mailing list <scikit-learn@python.org
    <mailto:scikit-learn@python.org>>
    Subject: Re: [scikit-learn] unclear help file for
            sklearn.decomposition.pca
    Message-ID: <b2abdcfd-4736-929e-6304-b93832932...@gmail.com
    <mailto:b2abdcfd-4736-929e-6304-b93832932...@gmail.com>>
    Content-Type: text/plain; charset=utf-8; format=flowed

    Ismael,

    as far as I saw the sklearn.decomposition.PCA doesn't mention
    scaling at
    all (except for the whiten parameter which is post-transformation
    scaling).

    So since it doesn't mention it, it makes sense that it doesn't do any
    scaling of the input. Same as np.linalg.svd.

    You can verify that PCA and np.linalg.svd yield the same results, with

    ```
     >>> import numpy as np
     >>> from sklearn.decomposition import PCA
     >>> import numpy.linalg
     >>> X = np.random.RandomState(42).rand(10, 4)
     >>> n_components = 2
     >>> PCA(n_components, svd_solver='full').fit_transform(X)
    ```

    and

    ```
     >>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
     >>> (X - X.mean(axis=0)).dot(V[:n_components].T)
    ```

    --
    Roman

    On 16/10/17 03:42, Ismael Lemhadri wrote:
    > Dear all,
    > The help file for the PCA class is unclear about the preprocessing
    > performed to the data.
    > You can check on line 410 here:
    >
    
https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
    
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
    >
    
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
    
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>
    > that the matrix is centered but NOT scaled, before performing the
    > singular value decomposition.
    > However, the help files do not make any mention of it.
    > This is unclear for someone who, like me, just wanted to compare
    that
    > the PCA and np.linalg.svd give the same results. In academic
    settings,
    > students are often asked to compare different methods and to
    check that
    > they yield the same results. I expect that many students have
    confronted
    > this problem before...
    > Best,
    > Ismael Lemhadri
    >
    >
    > _______________________________________________
    > scikit-learn mailing list
    > scikit-learn@python.org <mailto:scikit-learn@python.org>
    > https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    >



    ------------------------------

    Message: 3
    Date: Mon, 16 Oct 2017 15:27:48 +0200
    From: Serafeim Loukas <seral...@gmail.com <mailto:seral...@gmail.com>>
    To: scikit-learn@python.org <mailto:scikit-learn@python.org>
    Subject: [scikit-learn] Question about LDA's coef_ attribute
    Message-ID: <58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com
    <mailto:58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com>>
    Content-Type: text/plain; charset="us-ascii"

    Dear Scikit-learn community,

    Since the documentation of the LDA
    
(http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
    
<http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
    
<http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
    
<http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
    is not so clear, I would like to ask if the lda.coef_ attribute
    stores the eigenvectors from the SVD decomposition.

    Thank you in advance,
    Serafeim
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL:
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>>

    ------------------------------

    Message: 4
    Date: Mon, 16 Oct 2017 16:57:52 +0200
    From: Alexandre Gramfort <alexandre.gramf...@inria.fr
    <mailto:alexandre.gramf...@inria.fr>>
    To: Scikit-learn mailing list <scikit-learn@python.org
    <mailto:scikit-learn@python.org>>
    Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
    Message-ID:
           
    <cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com
    <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>>
    Content-Type: text/plain; charset="UTF-8"

    no it stores the direction of the decision function to match the
    API of
    linear models.

    HTH
    Alex

    On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
    <seral...@gmail.com <mailto:seral...@gmail.com>> wrote:
    > Dear Scikit-learn community,
    >
    > Since the documentation of the LDA
    >
    
(http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
    
<http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>)
    > is not so clear, I would like to ask if the lda.coef_ attribute
    stores the
    > eigenvectors from the SVD decomposition.
    >
    > Thank you in advance,
    > Serafeim
    >
    > _______________________________________________
    > scikit-learn mailing list
    > scikit-learn@python.org <mailto:scikit-learn@python.org>
    > https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    >


    ------------------------------

    Message: 5
    Date: Mon, 16 Oct 2017 17:02:46 +0200
    From: Serafeim Loukas <seral...@gmail.com <mailto:seral...@gmail.com>>
    To: Scikit-learn mailing list <scikit-learn@python.org
    <mailto:scikit-learn@python.org>>
    Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
    Message-ID: <413210d2-56ae-41a4-873f-d171bb365...@gmail.com
    <mailto:413210d2-56ae-41a4-873f-d171bb365...@gmail.com>>
    Content-Type: text/plain; charset="us-ascii"

    Dear Alex,

    Thank you for the prompt response.

    Are the eigenvectors stored in some variable ?
    Does the lda.scalings_ attribute contain the eigenvectors ?

    Best,
    Serafeim

    > On 16 Oct 2017, at 16:57, Alexandre Gramfort
    <alexandre.gramf...@inria.fr <mailto:alexandre.gramf...@inria.fr>>
    wrote:
    >
    > no it stores the direction of the decision function to match the
    API of
    > linear models.
    >
    > HTH
    > Alex
    >
    > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
    <seral...@gmail.com <mailto:seral...@gmail.com>> wrote:
    >> Dear Scikit-learn community,
    >>
    >> Since the documentation of the LDA
    >>
    
(http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
    
<http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>)
    >> is not so clear, I would like to ask if the lda.coef_ attribute
    stores the
    >> eigenvectors from the SVD decomposition.
    >>
    >> Thank you in advance,
    >> Serafeim
    >>
    >> _______________________________________________
    >> scikit-learn mailing list
    >> scikit-learn@python.org <mailto:scikit-learn@python.org>
    >> https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    >>
    > _______________________________________________
    > scikit-learn mailing list
    > scikit-learn@python.org <mailto:scikit-learn@python.org>
    > https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL:
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html
    
<http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>>

    ------------------------------

    Subject: Digest Footer

    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>


    ------------------------------

    End of scikit-learn Digest, Vol 19, Issue 25
    ********************************************




_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to