On 09/28/2018 01:38 PM, Andreas Mueller wrote:
On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial
adoption and that already includes open source exporters for several
families of scikit-learn models:
https://github.com/onnx/onnxmltools
Didn't know about that. This is really nice! What do you think about
referring to it under
http://scikit-learn.org/stable/modules/model_persistence.html to make
people aware that this option exists?
Would be happy to add a PR.
I don't think an open source runtime has been announced yet (or they
didn't email me like they promised lol).
I'm quite excited about this as well.
Javier:
The problem is not so much storing the "model" but storing how to make
predictions. Different versions could act differently
on the same data structure - and the data structure could change. Both
happen in scikit-learn.
So if you want to make sure the right thing happens across versions,
you either need to provide serialization and deserialization for
every version and conversion between those or you need to provide a
way to store the prediction function,
which basically means you need a turing-complete language (that's what
ONNX does).
We basically said doing the first is not feasible within scikit-learn
given our current amount of resources, and no-one
has even tried doing it outside of scikit-learn (which would be
possible).
Implementing a complete prediction serialization language (the second
option) is definitely outside the scope of sklearn.
Maybe we should add to the FAQ why serialization is hard?
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn