I believe open-scoring is one of the well-known PMML serving frameworks in Java land (https://github.com/jpmml/openscoring). One can also use the raw https://github.com/jpmml/jpmml-evaluator for embedding in apps.
(Note the license on both of these is AGPL - the older version of JPMML used to be Apache2 if I recall correctly). On Fri, 1 Jul 2016 at 20:15 Jacek Laskowski <ja...@japila.pl> wrote: > Hi Nick, > > Thanks a lot for the exhaustive and prompt response! (In the meantime > I watched a video about PMML to get a better understanding of the > topic). > > What are the tools that could "consume" PMML exports (after running > JPMML)? What tools would be the endpoint to deliver low-latency > predictions by doing this "a bit of linear algebra and some basic > transformations"? > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 1, 2016 at 6:47 PM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > > Generally there are 2 ways to use a trained pipeline model - (offline) > batch > > scoring, and real-time online scoring. > > > > For batch (or even "mini-batch" e.g. on Spark streaming data), then yes > > certainly loading the model back in Spark and feeding new data through > the > > pipeline for prediction works just fine, and this is essentially what is > > supported in 1.6 (and more or less full coverage in 2.0). For large batch > > cases this can be quite efficient. > > > > However, usually for real-time use cases, the latency required is fairly > low > > - of the order of a few ms to a few 100ms for a request (some examples > > include recommendations, ad-serving, fraud detection etc). > > > > In these cases, using Spark has 2 issues: (1) latency for prediction on > the > > pipeline, which is based on DataFrames and therefore distributed > execution, > > is usually fairly high "per request"; (2) this requires pulling in all of > > Spark for your real-time serving layer (or running a full Spark cluster), > > which is usually way too much overkill - all you really need for serving > is > > a bit of linear algebra and some basic transformations. > > > > So for now, unfortunately there is not much in the way of options for > > exporting your pipelines and serving them outside of Spark - the > JPMML-based > > project mentioned on this thread is one option. The other option at this > > point is to write your own export functionality and your own serving > layer. > > > > There is (very initial) movement towards improving the local serving > > possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 > which > > was the "first step" in this process). > > > > On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote: > >> > >> Hi Rishabh, > >> > >> I've just today had similar conversation about how to do a ML Pipeline > >> deployment and couldn't really answer this question and more because I > >> don't really understand the use case. > >> > >> What would you expect from ML Pipeline model deployment? You can save > >> your model to a file by model.write.overwrite.save("model_v1"). > >> > >> model_v1 > >> |-- metadata > >> | |-- _SUCCESS > >> | `-- part-00000 > >> `-- stages > >> |-- 0_regexTok_b4265099cc1c > >> | `-- metadata > >> | |-- _SUCCESS > >> | `-- part-00000 > >> |-- 1_hashingTF_8de997cf54ba > >> | `-- metadata > >> | |-- _SUCCESS > >> | `-- part-00000 > >> `-- 2_linReg_3942a71d2c0e > >> |-- data > >> | |-- _SUCCESS > >> | |-- _common_metadata > >> | |-- _metadata > >> | `-- > >> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet > >> `-- metadata > >> |-- _SUCCESS > >> `-- part-00000 > >> > >> 9 directories, 12 files > >> > >> What would you like to have outside SparkContext? What's wrong with > >> using Spark? Just curious hoping to understand the use case better. > >> Thanks. > >> > >> Pozdrawiam, > >> Jacek Laskowski > >> ---- > >> https://medium.com/@jaceklaskowski/ > >> Mastering Apache Spark http://bit.ly/mastering-apache-spark > >> Follow me at https://twitter.com/jaceklaskowski > >> > >> > >> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnex...@gmail.com> > >> wrote: > >> > Hi All, > >> > > >> > I am looking for ways to deploy a ML Pipeline model in production . > >> > Spark has already proved to be a one of the best framework for model > >> > training and creation, but once the ml pipeline model is ready how > can I > >> > deploy it outside spark context ? > >> > MLlib model has toPMML method but today Pipeline model can not be > saved > >> > to > >> > PMML. There are some frameworks like MLeap which are trying to > abstract > >> > Pipeline Model and provide ML Pipeline Model deployment outside spark > >> > context,but currently they don't have most of the ml transformers and > >> > estimators. > >> > I am looking for related work going on this area. > >> > Any pointers will be helpful. > >> > > >> > Thanks, > >> > Rishabh. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > >