On 09/28/2018 01:38 PM, Andreas Mueller wrote:


On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:

https://github.com/onnx/onnxmltools

Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists?
Would be happy to add a PR.


I don't think an open source runtime has been announced yet (or they didn't email me like they promised lol).
I'm quite excited about this as well.

Javier:
The problem is not so much storing the "model" but storing how to make predictions. Different versions could act differently on the same data structure - and the data structure could change. Both happen in scikit-learn. So if you want to make sure the right thing happens across versions, you either need to provide serialization and deserialization for every version and conversion between those or you need to provide a way to store the prediction function, which basically means you need a turing-complete language (that's what ONNX does).

We basically said doing the first is not feasible within scikit-learn given our current amount of resources, and no-one has even tried doing it outside of scikit-learn (which would be possible). Implementing a complete prediction serialization language (the second option) is definitely outside the scope of sklearn.


Maybe we should add to the FAQ why serialization is hard?
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to