Congrats everyone, this is awesome!!! I just started teaching an ML course this 
semester and introduced scikit-learn this week -- it was a great timing to 
demonstrate how well maintained the library is and praise all the efforts that 
go into it :). 

> I think model serialization should be a priority.


While this could potentially a bit inefficient for large non-parametric models, 
I think the serialization into a text-readable format has some advantages for 
real-world use cases. E.g., sharing models (pickle is a bit problematic because 
of security issues) in applications but also as supplementary material in 
archives for accompanying research articles, etc (esp in cases where datasets 
cannot be shared in their original form due to some copyright or other 
concerns).

Chris Emmery, Chris Wagner and I toyed around with JSON a while back 
(https://cmry.github.io/notes/serialize), and it could be feasible -- but yeah, 
it will involve some work, especially with testing things thoroughly for all 
kinds of estimators. Maybe this could somehow be automated though in a 
grid-search kind of way with a build matrix for estimators and parameters once 
a general framework has been developed. 


> On Sep 27, 2018, at 6:22 PM, Javier López <jlo...@ende.cc> wrote:
> 
> First of all, congratulations on the release, great work, everyone!
> 
> I think model serialization should be a priority. Particularly, 
> I think that (whenever practical) there should be a way of 
> serializing estimators (either unfitted or fitted) in a text-readable format,
> prefereably JSON or PMML/PFA (or several others).
> 
> Obviously for some models it is not practical (eg random forests with 
> thousands of trees), but for simpler situations I believe it would
> provide a great tool for model sharing without the dangers of pickling
> and the versioning hell.
> 
> I am (painfully) aware that when rebuilding a model on a different setup,
> it might yield different results; in my company we address that by saving
> together with the serialized model a reasonably small validation dataset
> together with its predictions, upon unserializing we check that the rebuilt
> model reproduces the predictions within some acceptable range. 
> 
> About the new release, I am particularly happy about the joblib update,
> as it has been a major source of pain for me over the last year. On that
> note, I think it would be a good idea to stop vendoring joblib and list it as
> a dependency instead; wheels, pip and conda are mature enough to 
> handle the situation nowadays.
> 
> Last, but not least, it would be great to relax the checks concerning nans 
> at prediction time, and allow, for instance, that an estimator yields nans if
> any features are nan's; we face that situation when working with ensembles,
> where a few of the submodels might not get enough features available, but
> the rest do.  
> 
> Of the top of my head, that's all, keep up the fantastic work!
> J
> 
> On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller <t3k...@gmail.com> wrote:
> I think we should work on the formatting, make sure it's complete, link it to 
> issues /PRs and
> then make this into a public document on the website and request feedback.
> 
> Right now it's a bit in a format that is understandable for core-developers 
> but some of the things are not clear
> to the average audience. Linking the issues / PRs will help that a bit, but 
> also we might want to add a sentence
> to each point in the roadmap.
> 
> I had some issues with the formatting, I'll try to fix that later.
> Any volunteers for adding the frozen estimator (or has someone added that 
> already?).
> 
> Cheers,
> Andy
> 
> 
> On 09/27/2018 04:29 AM, Olivier Grisel wrote:
>> Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.noth...@gmail.com> a écrit 
>> :
>> And for those interested in what's in the pipeline, we are trying to draft a 
>> roadmap... 
>> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>> 
>> But there are no doubt many features that are absent there too.
>> 
>> Indeed, it would be great to get some feedback on this roadmap from heavy 
>> scikit-learn users: which points do you think are the most important? What 
>> is missing from this roadmap?
>> 
>> Feel free to reply to this thread.
>> 
>> -- 
>> Olivier
>> 
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> 
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to