I would propose PipeGraph for stacking, it comes natural and it could help
a lot in making things easier for core developers.

Disclaimer: I'm coauthor of PipeGraph


Manuel Castejón Limas

Escuela de Ingenierías Industrial, Informática y Aeroespacial

Universidad de León

Campus de Vegazana sn.

24071. León. Spain.

e-mail: manuel.caste...@unileon.es

Tel.: +34 987 291 779



Aviso de confidencialidad <https://www.unileon.es/mail-disclaimer/20180525>

Confidentiality Notice <https://www.unileon.es/mail-disclaimer/20180525>




El mar., 2 oct. 2018 a las 3:13, Jason Sanchez (<2jasonsanc...@gmail.com>)
escribió:

> The current roadmap is amazing. One feature that would be exciting is
> better support for multilayer stacking with caching and the ability to add
> models to already trained layers.
>
> I saw this history: https://github.com/scikit-learn/scikit-learn/pull/8960
>
> This library is very close:
> * API is somewhat awkward, but otherwise good. Does not cache intermediate
> steps. https://wolpert.readthedocs.io/en/latest/index.html
>
> These solutions seem to allow only two layers:
> *
> https://github.com/scikit-learn/scikit-learn/issues/4816#issuecomment-217817717
> *
> https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
> * https://github.com/scikit-learn/scikit-learn/pull/6674
>
> The people who put these other libraries together have made an incredibly
> welcome effort to solve a real need and it would be amazing to see a payoff
> for their effort in the form of an addition of stacking to scikit-learn's
> core library.
>
> As another data point, I attached a simple implementation I put together
> to illustrate what I think are core needs of this feature. Feel free to
> browse the code. Here is the short list:
> * Infinite layers (or at least 3 ;) )
> * Choice of CV or OOB for each model
> * Ability to add a new model to a layer after the stacked ensemble has
> been trained and refit the pipeline such that only models that must be
> retrained are retrained (i.e. train the added model and retrain all models
> in higher layers)
> * All standard scikit-learn pipeline goodness (introspection, grid search,
> serializability, etc)
>
> Thanks all! This library is making a real difference for good in the lives
> of many people.
>
> Jason
>
>
> On Fri, Sep 28, 2018 at 11:35 AM <scikit-learn-requ...@python.org> wrote:
>
>> Send scikit-learn mailing list submissions to
>>         scikit-learn@python.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://mail.python.org/mailman/listinfo/scikit-learn
>> or, via email, send a message with subject or body 'help' to
>>         scikit-learn-requ...@python.org
>>
>> You can reach the person managing the list at
>>         scikit-learn-ow...@python.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of scikit-learn digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: [ANN] Scikit-learn 0.20.0 (Sebastian Raschka)
>>    2. Re: [ANN] Scikit-learn 0.20.0 (Andreas Mueller)
>>    3. Re: [ANN] Scikit-learn 0.20.0 (Andreas Mueller)
>>    4. Re: [ANN] Scikit-learn 0.20.0 (Manuel CASTEJ?N LIMAS)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 28 Sep 2018 11:10:50 -0500
>> From: Sebastian Raschka <m...@sebastianraschka.com>
>> To: Scikit-learn mailing list <scikit-learn@python.org>
>> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
>> Message-ID:
>>         <efc21eb5-ccc4-48cb-aeb5-2c938689e...@sebastianraschka.com>
>> Content-Type: text/plain;       charset=us-ascii
>>
>> >
>> > > I think model serialization should be a priority.
>> >
>> > There is also the ONNX specification that is gaining industrial
>> adoption and that already includes open source exporters for several
>> families of scikit-learn models:
>> >
>> > https://github.com/onnx/onnxmltools
>>
>>
>> Didn't know about that. This is really nice! What do you think about
>> referring to it under
>> http://scikit-learn.org/stable/modules/model_persistence.html to make
>> people aware that this option exists?
>> Would be happy to add a PR.
>>
>> Best,
>> Sebastian
>>
>>
>>
>> > On Sep 28, 2018, at 9:30 AM, Olivier Grisel <olivier.gri...@ensta.org>
>> wrote:
>> >
>> >
>> > > I think model serialization should be a priority.
>> >
>> > There is also the ONNX specification that is gaining industrial
>> adoption and that already includes open source exporters for several
>> families of scikit-learn models:
>> >
>> > https://github.com/onnx/onnxmltools
>> >
>> > --
>> > Olivier
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 28 Sep 2018 13:38:39 -0400
>> From: Andreas Mueller <t3k...@gmail.com>
>> To: scikit-learn@python.org
>> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
>> Message-ID: <96edd381-2352-f183-486a-b86e395a7...@gmail.com>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>>
>>
>> On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>> >>> I think model serialization should be a priority.
>> >> There is also the ONNX specification that is gaining industrial
>> adoption and that already includes open source exporters for several
>> families of scikit-learn models:
>> >>
>> >> https://github.com/onnx/onnxmltools
>> >
>> > Didn't know about that. This is really nice! What do you think about
>> referring to it under
>> http://scikit-learn.org/stable/modules/model_persistence.html to make
>> people aware that this option exists?
>> > Would be happy to add a PR.
>> >
>> >
>> I don't think an open source runtime has been announced yet (or they
>> didn't email me like they promised lol).
>> I'm quite excited about this as well.
>>
>> Javier:
>> The problem is not so much storing the "model" but storing how to make
>> predictions. Different versions could act differently
>> on the same data structure - and the data structure could change. Both
>> happen in scikit-learn.
>> So if you want to make sure the right thing happens across versions, you
>> either need to provide serialization and deserialization for
>> every version and conversion between those or you need to provide a way
>> to store the prediction function,
>> which basically means you need a turing-complete language (that's what
>> ONNX does).
>>
>> We basically said doing the first is not feasible within scikit-learn
>> given our current amount of resources, and no-one
>> has even tried doing it outside of scikit-learn (which would be possible).
>> Implementing a complete prediction serialization language (the second
>> option) is definitely outside the scope of sklearn.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Fri, 28 Sep 2018 13:41:13 -0400
>> From: Andreas Mueller <t3k...@gmail.com>
>> To: scikit-learn@python.org
>> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
>> Message-ID: <4cfbb327-7489-70ff-8fa3-a21079ec0...@gmail.com>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>>
>>
>> On 09/28/2018 01:38 PM, Andreas Mueller wrote:
>> >
>> >
>> > On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>> >>>> I think model serialization should be a priority.
>> >>> There is also the ONNX specification that is gaining industrial
>> >>> adoption and that already includes open source exporters for several
>> >>> families of scikit-learn models:
>> >>>
>> >>> https://github.com/onnx/onnxmltools
>> >>
>> >> Didn't know about that. This is really nice! What do you think about
>> >> referring to it under
>> >> http://scikit-learn.org/stable/modules/model_persistence.html to make
>> >> people aware that this option exists?
>> >> Would be happy to add a PR.
>> >>
>> >>
>> > I don't think an open source runtime has been announced yet (or they
>> > didn't email me like they promised lol).
>> > I'm quite excited about this as well.
>> >
>> > Javier:
>> > The problem is not so much storing the "model" but storing how to make
>> > predictions. Different versions could act differently
>> > on the same data structure - and the data structure could change. Both
>> > happen in scikit-learn.
>> > So if you want to make sure the right thing happens across versions,
>> > you either need to provide serialization and deserialization for
>> > every version and conversion between those or you need to provide a
>> > way to store the prediction function,
>> > which basically means you need a turing-complete language (that's what
>> > ONNX does).
>> >
>> > We basically said doing the first is not feasible within scikit-learn
>> > given our current amount of resources, and no-one
>> > has even tried doing it outside of scikit-learn (which would be
>> > possible).
>> > Implementing a complete prediction serialization language (the second
>> > option) is definitely outside the scope of sklearn.
>> >
>> >
>> Maybe we should add to the FAQ why serialization is hard?
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Fri, 28 Sep 2018 20:34:43 +0200
>> From: Manuel CASTEJ?N LIMAS <mc...@unileon.es>
>> To: Scikit-learn user and developer mailing list
>>         <scikit-learn@python.org>
>> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
>> Message-ID:
>>         <CAAQ3=
>> ufntyo02ykr9ywrcjicb8a3cutpn47l4myzwxennyp...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> How about a docker based approach? Just thinking out loud
>> Best
>> Manuel
>>
>> El vie., 28 sept. 2018 19:43, Andreas Mueller <t3k...@gmail.com>
>> escribi?:
>>
>> >
>> >
>> > On 09/28/2018 01:38 PM, Andreas Mueller wrote:
>> > >
>> > >
>> > > On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>> > >>>> I think model serialization should be a priority.
>> > >>> There is also the ONNX specification that is gaining industrial
>> > >>> adoption and that already includes open source exporters for several
>> > >>> families of scikit-learn models:
>> > >>>
>> > >>> https://github.com/onnx/onnxmltools
>> > >>
>> > >> Didn't know about that. This is really nice! What do you think about
>> > >> referring to it under
>> > >> http://scikit-learn.org/stable/modules/model_persistence.html to
>> make
>> > >> people aware that this option exists?
>> > >> Would be happy to add a PR.
>> > >>
>> > >>
>> > > I don't think an open source runtime has been announced yet (or they
>> > > didn't email me like they promised lol).
>> > > I'm quite excited about this as well.
>> > >
>> > > Javier:
>> > > The problem is not so much storing the "model" but storing how to make
>> > > predictions. Different versions could act differently
>> > > on the same data structure - and the data structure could change. Both
>> > > happen in scikit-learn.
>> > > So if you want to make sure the right thing happens across versions,
>> > > you either need to provide serialization and deserialization for
>> > > every version and conversion between those or you need to provide a
>> > > way to store the prediction function,
>> > > which basically means you need a turing-complete language (that's what
>> > > ONNX does).
>> > >
>> > > We basically said doing the first is not feasible within scikit-learn
>> > > given our current amount of resources, and no-one
>> > > has even tried doing it outside of scikit-learn (which would be
>> > > possible).
>> > > Implementing a complete prediction serialization language (the second
>> > > option) is definitely outside the scope of sklearn.
>> > >
>> > >
>> > Maybe we should add to the FAQ why serialization is hard?
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> http://mail.python.org/pipermail/scikit-learn/attachments/20180928/f52258e8/attachment.html
>> >
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> ------------------------------
>>
>> End of scikit-learn Digest, Vol 30, Issue 25
>> ********************************************
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to