Re: Deploying ML Pipeline Model

Nick Pentreath Fri, 01 Jul 2016 12:16:50 -0700

I believe open-scoring is one of the well-known PMML serving frameworks in
Java land (https://github.com/jpmml/openscoring). One can also use the raw
https://github.com/jpmml/jpmml-evaluator for embedding in apps.


(Note the license on both of these is AGPL - the older version of JPMML
used to be Apache2 if I recall correctly).

On Fri, 1 Jul 2016 at 20:15 Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Nick,
>
> Thanks a lot for the exhaustive and prompt response! (In the meantime
> I watched a video about PMML to get a better understanding of the
> topic).
>
> What are the tools that could "consume" PMML exports (after running
> JPMML)? What tools would be the endpoint to deliver low-latency
> predictions by doing this "a bit of linear algebra and some basic
> transformations"?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 1, 2016 at 6:47 PM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
> > Generally there are 2 ways to use a trained pipeline model - (offline)
> batch
> > scoring, and real-time online scoring.
> >
> > For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> > certainly loading the model back in Spark and feeding new data through
> the
> > pipeline for prediction works just fine, and this is essentially what is
> > supported in 1.6 (and more or less full coverage in 2.0). For large batch
> > cases this can be quite efficient.
> >
> > However, usually for real-time use cases, the latency required is fairly
> low
> > - of the order of a few ms to a few 100ms for a request (some examples
> > include recommendations, ad-serving, fraud detection etc).
> >
> > In these cases, using Spark has 2 issues: (1) latency for prediction on
> the
> > pipeline, which is based on DataFrames and therefore distributed
> execution,
> > is usually fairly high "per request"; (2) this requires pulling in all of
> > Spark for your real-time serving layer (or running a full Spark cluster),
> > which is usually way too much overkill - all you really need for serving
> is
> > a bit of linear algebra and some basic transformations.
> >
> > So for now, unfortunately there is not much in the way of options for
> > exporting your pipelines and serving them outside of Spark - the
> JPMML-based
> > project mentioned on this thread is one option. The other option at this
> > point is to write your own export functionality and your own serving
> layer.
> >
> > There is (very initial) movement towards improving the local serving
> > possibilities (see https://issues.apache.org/jira/browse/SPARK-13944
> which
> > was the "first step" in this process).
> >
> > On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote:
> >>
> >> Hi Rishabh,
> >>
> >> I've just today had similar conversation about how to do a ML Pipeline
> >> deployment and couldn't really answer this question and more because I
> >> don't really understand the use case.
> >>
> >> What would you expect from ML Pipeline model deployment? You can save
> >> your model to a file by model.write.overwrite.save("model_v1").
> >>
> >> model_v1
> >> |-- metadata
> >> |   |-- _SUCCESS
> >> |   `-- part-00000
> >> `-- stages
> >>     |-- 0_regexTok_b4265099cc1c
> >>     |   `-- metadata
> >>     |       |-- _SUCCESS
> >>     |       `-- part-00000
> >>     |-- 1_hashingTF_8de997cf54ba
> >>     |   `-- metadata
> >>     |       |-- _SUCCESS
> >>     |       `-- part-00000
> >>     `-- 2_linReg_3942a71d2c0e
> >>         |-- data
> >>         |   |-- _SUCCESS
> >>         |   |-- _common_metadata
> >>         |   |-- _metadata
> >>         |   `--
> >> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
> >>         `-- metadata
> >>             |-- _SUCCESS
> >>             `-- part-00000
> >>
> >> 9 directories, 12 files
> >>
> >> What would you like to have outside SparkContext? What's wrong with
> >> using Spark? Just curious hoping to understand the use case better.
> >> Thanks.
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> ----
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> >> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnex...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I am looking for ways to deploy a ML Pipeline model in production .
> >> > Spark has already proved to be a one of the best framework for model
> >> > training and creation, but once the ml pipeline model is ready how
> can I
> >> > deploy it outside spark context ?
> >> > MLlib model has toPMML method but today Pipeline model can not be
> saved
> >> > to
> >> > PMML. There are some frameworks like MLeap which are trying to
> abstract
> >> > Pipeline Model and provide ML Pipeline Model deployment outside spark
> >> > context,but currently they don't have most of the ml transformers and
> >> > estimators.
> >> > I am looking for related work going on this area.
> >> > Any pointers will be helpful.
> >> >
> >> > Thanks,
> >> > Rishabh.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
>

Re: Deploying ML Pipeline Model

Reply via email to