On Fri, Sep 28, 2018 at 1:03 AM Sebastian Raschka <m...@sebastianraschka.com>
wrote:

> Chris Emmery, Chris Wagner and I toyed around with JSON a while back (
> https://cmry.github.io/notes/serialize), and it could be feasible


I came across your notes a while back, they were really useful!
I hacked a variation of it that didn't need to know the model class in
advance:
https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
but is is VERY hackish, and it doesn't work with complex models with nested
components. (At work we use a further variation of this that also works on
pipelines and some specific nested stuff, like `mlxtend`'s
`SequentialFeatureSelector`)


> but yeah, it will involve some work, especially with testing things
> thoroughly for all kinds of estimators. Maybe this could somehow be
> automated though in a grid-search kind of way with a build matrix for
> estimators and parameters once a general framework has been developed.
>

I considered making this serialization into an external project, but I
think this would be much easier if estimators provided a dunder method
`__serialize__` (or whatever) that would handle the idiosyncrasies of each
particular family, I don't believe there will be a "one-size-fits-all"
solution for this problem. This approach would also make it possible to
work on it incrementally, raising a default `NotImplementedError` for
estimators that haven't been addressed yet.

In the long run, I also believe that the "proper" way to do this is to
allow dumping entire processes into PFA: http://dmg.org/pfa/docs/motivation/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to