Re: [scikit-learn] Sprint discussion points?

Nicolas Hug Thu, 14 Feb 2019 08:46:00 -0800

or we could go as far as to schedule meetings on the different topics.


Given the number of issues to discuss this is probably the best approach IMO

On 2/14/19 8:31 AM, Andreas Mueller wrote:


As I said, I think it's too much and we need to prioritize.

We could either rank issues and start with some and see how far weget, or we could go as far as to schedule meetings on the differenttopics.


Also, I'll be only arriving Tuesday late morning, I think.


On 2/14/19 8:05 AM, Adrin wrote:

I've been working on some bias mitigation metrics and methods andthat usecasechanges the data as well as up/down sampling as a transformer. Almostall thosemethods also need sample properties for the observations to work. I'mtrying tomake them "sklearn compatible", but for now it's pretty hacky. So I'dbe happy if

we discuss the union of what Joel and Andy suggest.

Cheers,
Adrin.

On Thu, Feb 14, 2019, 11:47 Guillaume Lemaître<[email protected] <mailto:[email protected]> wrote:


    I am really interested in the union of the list given by Andy and
    Joel.

    I'll like to have some discussions related to the "impute"
    module. Compare to the other topics, it is not a high priority
    discussion thought.

    On Thu, 14 Feb 2019 at 05:31, Joel Nothman
    <[email protected] <mailto:[email protected]>> wrote:

        Convergence in logistic regression
        (https://github.com/scikit-learn/scikit-learn/issues/11536) is
        indeed one problem (and it presents a general issue of what
        max_iter means when you have several solvers, or how good
        defaults are selected). But I was sure we had problems with
        non-determinism on some platforms... but now can't find.

        > my students have basically no way to figure out what
        features the coefficients in their linear model correspond
        to, that seems a bit more important to me.

        Yes, I agree... Assuming coefficients are helpful, rather
        than using permutation-based measures of importance, for
        instance.

        I generally think a review of distances might be a good thing
        at some point, given the confusing triplication across
        sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial...
        and that minkowski,p=2 is not implemented the same as euclidean.


        On Thu, 14 Feb 2019 at 12:56, Andreas Mueller
        <[email protected] <mailto:[email protected]>> wrote:

            Do you have a reference for the logistic regression
            stability? Is it convergence warnings?

            Happy to discuss the other two issues, though I feel they
            seem easier than most of what's on my list.

            I have no idea what's going on with OPTICS tbh, and I'll
            leave it up to you and the others to decide whether
            that's something we should discuss.
            I can try to read up and weigh in but that might not be
            the most effective way to do it.

            the sample props is something I left out because I
            personally don't feel it's a priority compared to all the
            other things;
            my students have basically no way to figure out what
            features the coefficients in their linear model
            correspond to, that seems a bit more important to me.

            We can put it on the discussion list again, but I'm not
            super enthusiastic about it.

            How should we prioritize things?


            On 2/13/19 8:08 PM, Joel Nothman wrote:

            Yes, I was thinking the same. I think there are some
            other core issues to solve, such as:

            * euclidean_distances numerical issues
            * commitment to ARM testing and debugging
            * logistic regression stability

            We should also nut out OPTICS issues or remove it from
            0.21. I'm still keen on trying to work out sample props
            (supporting weighted scoring at least), but perhaps I'm
            being persuaded this will never be a top-priority
            requirement, and the solutions add much complexity.

            On Thu, 14 Feb 2019 at 07:39, Andreas Mueller
            <[email protected] <mailto:[email protected]>> wrote:

                Hey all.

                Should we collect some discussion points for the sprint?

                There's an unusual amount of core-devs present and I
                think we should seize the opportunity.
                Maybe we should create a page in the wiki or add it
                to the sprint page?

                Things that are high on my list of priorities are:

                  * slicing pipelines
                  * add get_feature_names to pipelines
                  * freezing estimator
                  * faster multi-metric scoring
                  * fit_transform doing something other than
                    fit.transform
                  * imbalance-learn interface / subsampling in pipelines
                  * Specifying search spaces and valid hyper
                    parameters
                    (https://github.com/scikit-learn/scikit-learn/issues/13031).
                  * allowing EstimatorCV-style speed-up in GridSearches
                  * storing pandas column names and using them as
                    feature names


                Trying to discuss all of these might be too much,
                but maybe we can figure out a subset and make sure
                we have sleps to discuss?
                Most of these issues are on the roadmap, issue 13031
                is reladed to #18 but not directly on the roadmap.

                Thanks,
                Andy
                _______________________________________________
                scikit-learn mailing list
                [email protected] <mailto:[email protected]>
                https://mail.python.org/mailman/listinfo/scikit-learn


            _______________________________________________
            scikit-learn mailing list
            [email protected]  <mailto:[email protected]>
            https://mail.python.org/mailman/listinfo/scikit-learn

            _______________________________________________
            scikit-learn mailing list
            [email protected] <mailto:[email protected]>
            https://mail.python.org/mailman/listinfo/scikit-learn

        _______________________________________________
        scikit-learn mailing list
        [email protected] <mailto:[email protected]>
        https://mail.python.org/mailman/listinfo/scikit-learn

--Guillaume Lemaitre

    INRIA Saclay - Parietal team
    Center for Data Science Paris-Saclay
    https://glemaitre.github.io/
    _______________________________________________
    scikit-learn mailing list
    [email protected] <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Sprint discussion points?

Reply via email to