I would propose PipeGraph for stacking, it comes natural and it could help a lot in making things easier for core developers.
Disclaimer: I'm coauthor of PipeGraph Manuel Castejón Limas Escuela de Ingenierías Industrial, Informática y Aeroespacial Universidad de León Campus de Vegazana sn. 24071. León. Spain. e-mail: manuel.caste...@unileon.es Tel.: +34 987 291 779 Aviso de confidencialidad <https://www.unileon.es/mail-disclaimer/20180525> Confidentiality Notice <https://www.unileon.es/mail-disclaimer/20180525> El mar., 2 oct. 2018 a las 3:13, Jason Sanchez (<2jasonsanc...@gmail.com>) escribió: > The current roadmap is amazing. One feature that would be exciting is > better support for multilayer stacking with caching and the ability to add > models to already trained layers. > > I saw this history: https://github.com/scikit-learn/scikit-learn/pull/8960 > > This library is very close: > * API is somewhat awkward, but otherwise good. Does not cache intermediate > steps. https://wolpert.readthedocs.io/en/latest/index.html > > These solutions seem to allow only two layers: > * > https://github.com/scikit-learn/scikit-learn/issues/4816#issuecomment-217817717 > * > https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ > * https://github.com/scikit-learn/scikit-learn/pull/6674 > > The people who put these other libraries together have made an incredibly > welcome effort to solve a real need and it would be amazing to see a payoff > for their effort in the form of an addition of stacking to scikit-learn's > core library. > > As another data point, I attached a simple implementation I put together > to illustrate what I think are core needs of this feature. Feel free to > browse the code. Here is the short list: > * Infinite layers (or at least 3 ;) ) > * Choice of CV or OOB for each model > * Ability to add a new model to a layer after the stacked ensemble has > been trained and refit the pipeline such that only models that must be > retrained are retrained (i.e. train the added model and retrain all models > in higher layers) > * All standard scikit-learn pipeline goodness (introspection, grid search, > serializability, etc) > > Thanks all! This library is making a real difference for good in the lives > of many people. > > Jason > > > On Fri, Sep 28, 2018 at 11:35 AM <scikit-learn-requ...@python.org> wrote: > >> Send scikit-learn mailing list submissions to >> scikit-learn@python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> or, via email, send a message with subject or body 'help' to >> scikit-learn-requ...@python.org >> >> You can reach the person managing the list at >> scikit-learn-ow...@python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. Re: [ANN] Scikit-learn 0.20.0 (Sebastian Raschka) >> 2. Re: [ANN] Scikit-learn 0.20.0 (Andreas Mueller) >> 3. Re: [ANN] Scikit-learn 0.20.0 (Andreas Mueller) >> 4. Re: [ANN] Scikit-learn 0.20.0 (Manuel CASTEJ?N LIMAS) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 28 Sep 2018 11:10:50 -0500 >> From: Sebastian Raschka <m...@sebastianraschka.com> >> To: Scikit-learn mailing list <scikit-learn@python.org> >> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0 >> Message-ID: >> <efc21eb5-ccc4-48cb-aeb5-2c938689e...@sebastianraschka.com> >> Content-Type: text/plain; charset=us-ascii >> >> > >> > > I think model serialization should be a priority. >> > >> > There is also the ONNX specification that is gaining industrial >> adoption and that already includes open source exporters for several >> families of scikit-learn models: >> > >> > https://github.com/onnx/onnxmltools >> >> >> Didn't know about that. This is really nice! What do you think about >> referring to it under >> http://scikit-learn.org/stable/modules/model_persistence.html to make >> people aware that this option exists? >> Would be happy to add a PR. >> >> Best, >> Sebastian >> >> >> >> > On Sep 28, 2018, at 9:30 AM, Olivier Grisel <olivier.gri...@ensta.org> >> wrote: >> > >> > >> > > I think model serialization should be a priority. >> > >> > There is also the ONNX specification that is gaining industrial >> adoption and that already includes open source exporters for several >> families of scikit-learn models: >> > >> > https://github.com/onnx/onnxmltools >> > >> > -- >> > Olivier >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Fri, 28 Sep 2018 13:38:39 -0400 >> From: Andreas Mueller <t3k...@gmail.com> >> To: scikit-learn@python.org >> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0 >> Message-ID: <96edd381-2352-f183-486a-b86e395a7...@gmail.com> >> Content-Type: text/plain; charset=utf-8; format=flowed >> >> >> >> On 09/28/2018 12:10 PM, Sebastian Raschka wrote: >> >>> I think model serialization should be a priority. >> >> There is also the ONNX specification that is gaining industrial >> adoption and that already includes open source exporters for several >> families of scikit-learn models: >> >> >> >> https://github.com/onnx/onnxmltools >> > >> > Didn't know about that. This is really nice! What do you think about >> referring to it under >> http://scikit-learn.org/stable/modules/model_persistence.html to make >> people aware that this option exists? >> > Would be happy to add a PR. >> > >> > >> I don't think an open source runtime has been announced yet (or they >> didn't email me like they promised lol). >> I'm quite excited about this as well. >> >> Javier: >> The problem is not so much storing the "model" but storing how to make >> predictions. Different versions could act differently >> on the same data structure - and the data structure could change. Both >> happen in scikit-learn. >> So if you want to make sure the right thing happens across versions, you >> either need to provide serialization and deserialization for >> every version and conversion between those or you need to provide a way >> to store the prediction function, >> which basically means you need a turing-complete language (that's what >> ONNX does). >> >> We basically said doing the first is not feasible within scikit-learn >> given our current amount of resources, and no-one >> has even tried doing it outside of scikit-learn (which would be possible). >> Implementing a complete prediction serialization language (the second >> option) is definitely outside the scope of sklearn. >> >> >> >> >> ------------------------------ >> >> Message: 3 >> Date: Fri, 28 Sep 2018 13:41:13 -0400 >> From: Andreas Mueller <t3k...@gmail.com> >> To: scikit-learn@python.org >> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0 >> Message-ID: <4cfbb327-7489-70ff-8fa3-a21079ec0...@gmail.com> >> Content-Type: text/plain; charset=utf-8; format=flowed >> >> >> >> On 09/28/2018 01:38 PM, Andreas Mueller wrote: >> > >> > >> > On 09/28/2018 12:10 PM, Sebastian Raschka wrote: >> >>>> I think model serialization should be a priority. >> >>> There is also the ONNX specification that is gaining industrial >> >>> adoption and that already includes open source exporters for several >> >>> families of scikit-learn models: >> >>> >> >>> https://github.com/onnx/onnxmltools >> >> >> >> Didn't know about that. This is really nice! What do you think about >> >> referring to it under >> >> http://scikit-learn.org/stable/modules/model_persistence.html to make >> >> people aware that this option exists? >> >> Would be happy to add a PR. >> >> >> >> >> > I don't think an open source runtime has been announced yet (or they >> > didn't email me like they promised lol). >> > I'm quite excited about this as well. >> > >> > Javier: >> > The problem is not so much storing the "model" but storing how to make >> > predictions. Different versions could act differently >> > on the same data structure - and the data structure could change. Both >> > happen in scikit-learn. >> > So if you want to make sure the right thing happens across versions, >> > you either need to provide serialization and deserialization for >> > every version and conversion between those or you need to provide a >> > way to store the prediction function, >> > which basically means you need a turing-complete language (that's what >> > ONNX does). >> > >> > We basically said doing the first is not feasible within scikit-learn >> > given our current amount of resources, and no-one >> > has even tried doing it outside of scikit-learn (which would be >> > possible). >> > Implementing a complete prediction serialization language (the second >> > option) is definitely outside the scope of sklearn. >> > >> > >> Maybe we should add to the FAQ why serialization is hard? >> >> >> ------------------------------ >> >> Message: 4 >> Date: Fri, 28 Sep 2018 20:34:43 +0200 >> From: Manuel CASTEJ?N LIMAS <mc...@unileon.es> >> To: Scikit-learn user and developer mailing list >> <scikit-learn@python.org> >> Subject: Re: [scikit-learn] [ANN] Scikit-learn 0.20.0 >> Message-ID: >> <CAAQ3= >> ufntyo02ykr9ywrcjicb8a3cutpn47l4myzwxennyp...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> How about a docker based approach? Just thinking out loud >> Best >> Manuel >> >> El vie., 28 sept. 2018 19:43, Andreas Mueller <t3k...@gmail.com> >> escribi?: >> >> > >> > >> > On 09/28/2018 01:38 PM, Andreas Mueller wrote: >> > > >> > > >> > > On 09/28/2018 12:10 PM, Sebastian Raschka wrote: >> > >>>> I think model serialization should be a priority. >> > >>> There is also the ONNX specification that is gaining industrial >> > >>> adoption and that already includes open source exporters for several >> > >>> families of scikit-learn models: >> > >>> >> > >>> https://github.com/onnx/onnxmltools >> > >> >> > >> Didn't know about that. This is really nice! What do you think about >> > >> referring to it under >> > >> http://scikit-learn.org/stable/modules/model_persistence.html to >> make >> > >> people aware that this option exists? >> > >> Would be happy to add a PR. >> > >> >> > >> >> > > I don't think an open source runtime has been announced yet (or they >> > > didn't email me like they promised lol). >> > > I'm quite excited about this as well. >> > > >> > > Javier: >> > > The problem is not so much storing the "model" but storing how to make >> > > predictions. Different versions could act differently >> > > on the same data structure - and the data structure could change. Both >> > > happen in scikit-learn. >> > > So if you want to make sure the right thing happens across versions, >> > > you either need to provide serialization and deserialization for >> > > every version and conversion between those or you need to provide a >> > > way to store the prediction function, >> > > which basically means you need a turing-complete language (that's what >> > > ONNX does). >> > > >> > > We basically said doing the first is not feasible within scikit-learn >> > > given our current amount of resources, and no-one >> > > has even tried doing it outside of scikit-learn (which would be >> > > possible). >> > > Implementing a complete prediction serialization language (the second >> > > option) is definitely outside the scope of sklearn. >> > > >> > > >> > Maybe we should add to the FAQ why serialization is hard? >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20180928/f52258e8/attachment.html >> > >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 30, Issue 25 >> ******************************************** >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn