Re: [scikit-learn] Bootstrapping in sklearn

2018-09-23 Thread Daniel Saxton
Thanks, Olivier. We will try adding examples to show how it can be used in
conjunction with sklearn to generate confidence intervals on linear model
parameters, as well as prediction intervals for other classes of models.

On Thu, Sep 20, 2018, 11:55 AM Olivier Grisel 
wrote:

> I believe it would fit in sklearn-contrib even if it's more for
> statistical inference rather than machine learning style prediction.
>
> Others might disagree.
>
> Anyways, joining efforts to improve documentation, CI, testing and so on
> is always a good thing for your future users.
>
> --
> Olivier
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Bootstrapping in sklearn

2018-09-20 Thread Daniel Saxton
Olivier,

I got in touch with Constantine from the scikits-bootstrap package and he's
interested in merging the two projects.  If we were to get some
documentation together, do you feel that there is potential for inclusion
as an sklearn-contrib package?  I believe we would have most of the other
requirements (testing, continuous integration, etc.), but is there anything
else that you feel is missing?

Thanks,
Daniel

On Tue, Sep 18, 2018 at 2:42 AM Olivier Grisel 
wrote:

> This looks like a very useful project.
>
> There is also scikits-bootstraps [1]. Personally I prefer the flat package
> namespace of resample (I am not a fan of the 'scikits' namespace package)
> but I still think it would be great to contact the author to know if he
> would be interested in joining efforts.
>
> What currently lacks from both projects is a good sphinx-based
> documentation that explains in a couple of paragraphs with examples what
> are the different non-parametric inference methods, what are the pros and
> cons for each of them (sample complexity, computation complexity, kinds of
> inference, bias, theoretical asymptotic results, practical discrepancies
> observed in the finite sample setting, assumptions made on the distribution
> of the data...) and ideally the doc would have reference to examples (using
> sphinx-gallery) that would highlight the behavior of the tools in both
> nominal and pathological cases.
>
> [1] https://github.com/cgevans/scikits-bootstrap
>
> --
> Olivier
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Bootstrapping in sklearn

2018-09-18 Thread Daniel Saxton
J.B.,

Any help would certainly be welcome, no matter how slow.  I appreciate the
interest.

Thanks,
Daniel

On Tue, Sep 18, 2018, 8:47 AM Brown J.B. via scikit-learn <
scikit-learn@python.org> wrote:

> Resampling is a very important interesting contribution which relates very
> closely to my primary research in applied ML for chemical development.
> I'd be very interested in contributing documentation and learning new
> things along the way, but I potentially would be perceived as slow because
> of juggling many projects and responsibilities.
> (I failed once before at timely reviewing of a PR for multi-metric
> optimization for 0.19.)
> If still acceptable, please let me know, and I'm happy to try to help.
>
> J.B.
>
>
> 2018年9月18日(火) 20:37 Daniel Saxton :
>
>> Great, I went ahead and contacted Constantine.  Documentation was
>> actually the next thing that I wanted to work on, so hopefully he and I can
>> put something together.
>>
>> Thanks for the help.
>>
>> On Tue, Sep 18, 2018 at 2:42 AM Olivier Grisel 
>> wrote:
>>
>>> This looks like a very useful project.
>>>
>>> There is also scikits-bootstraps [1]. Personally I prefer the flat
>>> package namespace of resample (I am not a fan of the 'scikits' namespace
>>> package) but I still think it would be great to contact the author to know
>>> if he would be interested in joining efforts.
>>>
>>> What currently lacks from both projects is a good sphinx-based
>>> documentation that explains in a couple of paragraphs with examples what
>>> are the different non-parametric inference methods, what are the pros and
>>> cons for each of them (sample complexity, computation complexity, kinds of
>>> inference, bias, theoretical asymptotic results, practical discrepancies
>>> observed in the finite sample setting, assumptions made on the distribution
>>> of the data...) and ideally the doc would have reference to examples (using
>>> sphinx-gallery) that would highlight the behavior of the tools in both
>>> nominal and pathological cases.
>>>
>>> [1] https://github.com/cgevans/scikits-bootstrap
>>>
>>> --
>>> Olivier
>>> ___
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Bootstrapping in sklearn

2018-09-18 Thread Daniel Saxton
Great, I went ahead and contacted Constantine.  Documentation was actually
the next thing that I wanted to work on, so hopefully he and I can put
something together.

Thanks for the help.

On Tue, Sep 18, 2018 at 2:42 AM Olivier Grisel 
wrote:

> This looks like a very useful project.
>
> There is also scikits-bootstraps [1]. Personally I prefer the flat package
> namespace of resample (I am not a fan of the 'scikits' namespace package)
> but I still think it would be great to contact the author to know if he
> would be interested in joining efforts.
>
> What currently lacks from both projects is a good sphinx-based
> documentation that explains in a couple of paragraphs with examples what
> are the different non-parametric inference methods, what are the pros and
> cons for each of them (sample complexity, computation complexity, kinds of
> inference, bias, theoretical asymptotic results, practical discrepancies
> observed in the finite sample setting, assumptions made on the distribution
> of the data...) and ideally the doc would have reference to examples (using
> sphinx-gallery) that would highlight the behavior of the tools in both
> nominal and pathological cases.
>
> [1] https://github.com/cgevans/scikits-bootstrap
>
> --
> Olivier
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Bootstrapping in sklearn

2018-09-17 Thread Daniel Saxton
Hi all,

As everyone knows sklearn is excellent for building predictive models, but
an area where I believe there is still work to be done is in coming up with
measurements for the inherent uncertainties in those models.  (That there
is an appetite for this is I believe evidenced by the rise in popularity of
probabilistic programming.)  We can, for example, easily find point
estimates for coefficients of linear models in sklearn, but making
inferences from those point estimates is not possible without measurements
of probable error.

To address this and other problems I authored a package called resample
which implements the bootstrap and other randomization-based procedures
with the goal of performing largely nonparametric statistical inference on
a wide class of problems.  The package is built entirely in numpy and scipy
and so already integrates fairly well with sklearn (there is a tutorial
here which among other things shows applications using the Boston housing
data: https://github.com/dsaxton/resample/blob/master/doc/resample.ipynb)

Might there be interest in including something like this as an
sklearn-contrib package?  Essentially we are taking what is already in
sklearn.utils.resample and extending it to include other forms of the
bootstrap (e.g., balanced, parametric, stratified and / or smoothed),
algorithms for computing automatic confidence intervals, and procedures for
doing nonparametric, randomization-based hypothesis testing.

Here is the Github page:

https://github.com/dsaxton/resample

Of course, I also would greatly appreciate any input that others might have
on ways that this package could be made more useful.

Thanks,
Daniel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn