For ONNX you may be interested in https://github.com/onnx/onnxmltools - which supports conversion of a few skelarn models to ONNX already.
However as far as I am aware, none of the ONNX backends actually support the ONNX-ML extended spec (in open-source at least). So you would not be able to actually do prediction I think... As for PFA, to my current knowledge there is no library that does it yet. Our own Aardpfark project (https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA for now but would like to add sklearn support in the future. On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka <m...@sebastianraschka.com> wrote: > The ONNX-approach sounds most promising, esp. because it will also allow > library interoperability but I wonder if this is for parametric models only > and not for the nonparametric ones like KNN, tree-based classifiers, etc. > > All-in-all I can definitely see the appeal for having a way to export > sklearn estimators in a text-based format (e.g., via JSON), since it would > make sharing code easier. This doesn't even have to be compatible with > multiple sklearn versions. A typical use case would be to include these > JSON exports as e.g., supplemental files of a research paper for other > people to run the models etc. (here, one can just specify which sklearn > version it would require; of course, one could also share pickle files, by > I am personally always hesitant reg. running/trusting other people's pickle > files). > > Unfortunately though, as Gael pointed out, this "feature" would be a huge > burden for the devs, and it would probably also negatively impact the > development of scikit-learn itself because it imposes another design > constraint. > > However, I do think this sounds like an excellent case for a contrib > project. Like scikit-export, scikit-serialize or sth like that. > > Best, > Sebastian > > > > > On Oct 3, 2018, at 5:49 AM, Javier López <jlo...@ende.cc> wrote: > > > > > > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux < > gael.varoqu...@normalesup.org> wrote: > > The reason that pickles are brittle and that sharing pickles is a bad > > practice is that pickle use an implicitly defined data model, which is > > defined via the internals of objects. > > > > Plus the fact that loading a pickle can execute arbitrary code, and > there is no way to know > > if any malicious code is in there in advance because the contents of the > pickle cannot > > be easily inspected without loading/executing it. > > > > So, the problems of pickle are not specific to pickle, but rather > > intrinsic to any generic persistence code [*]. Writing persistence code > that > > does not fall in these problems is very costly in terms of developer time > > and makes it harder to add new methods or improve existing one. I am not > > excited about it. > > > > My "text-based serialization" suggestion was nowhere near as ambitious > as that, > > as I have already explained, and wasn't aiming at solving the versioning > issues, but > > rather at having something which is "about as good" as pickle but in a > human-readable > > format. I am not asking for a Turing-complete language to reproduce the > prediction > > function, but rather something simple in the spirit of the output > produced by the gist code I linked above, just for the model families where > it is reasonable: > > > > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31 > > > > The code I posted mostly works (specific cases of nested models need to > be addressed > > separately, as well as pipelines), and we have been using (a version of) > it in production > > for quite some time. But there are hackish aspects to it that we are not > happy with, > > such as the manual separation of init and fitted parameters by checking > if the name ends with "_", having to infer class name and location using > > "model.__class__.__name__" and "model.__module__", and the wacky use of > "__import__". > > > > My suggestion was more along the lines of adding some metadata to > sklearn estimators so > > that a code in a similar style would be nicer to write; little things > like having a `init_parameters` and `fit_parameters` properties that would > return the lists of named parameters, > > or a `model_info` method that would return data like sklearn version, > class name and location, or a package level dictionary pointing at the > estimator classes by a string name, like > > > > from sklearn.linear_models import LogisticRegression > > estimator_classes = {"LogisticRegression": LogisticRegression, ...} > > > > so that one can load the appropriate class from the string description > without calling __import__ or eval; that sort of stuff. > > > > I am aware this would not address the common complain of "prefect > prediction reproducibility" > > across versions, but I think we can all agree that this utopia of > perfect reproducibility is not > > feasible. > > > > And in the long, long run, I agree that PFA/onnx or whichever similar > format that emerges, is > > the way to go. > > > > J > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn