Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
It all depends on your latency requirements and volume. 100s of queries per minute, with an acceptable latency of up to a few seconds? Yes, you could use Spark for serving, especially if you're smart about caching results (and I don't mean just Spark caching, but caching recommendation results for

Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
Sean is correct - we now use jpmml-model (which is actually BSD 3-clause, where old jpmml was A2L, but either work) On Fri, 1 Jul 2016 at 21:40 Sean Owen wrote: > (The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use > JPMML in Spark and couldn't otherwise

Re: Deploying ML Pipeline Model

2016-07-01 Thread Saurabh Sardeshpande
Hi Nick, Thanks for the answer. Do you think an implementation like the one in this article is infeasible in production for say, hundreds of queries per minute? https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2. The article uses Flask to

Re: Deploying ML Pipeline Model

2016-07-01 Thread Sean Owen
(The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use JPMML in Spark and couldn't otherwise because the Affero license is not Apache compatible.) On Fri, Jul 1, 2016 at 8:16 PM, Nick Pentreath wrote: > I believe open-scoring is one of the well-known PMML

Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
I believe open-scoring is one of the well-known PMML serving frameworks in Java land (https://github.com/jpmml/openscoring). One can also use the raw https://github.com/jpmml/jpmml-evaluator for embedding in apps. (Note the license on both of these is AGPL - the older version of JPMML used to be

Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
Hi Nick, Thanks a lot for the exhaustive and prompt response! (In the meantime I watched a video about PMML to get a better understanding of the topic). What are the tools that could "consume" PMML exports (after running JPMML)? What tools would be the endpoint to deliver low-latency predictions

Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
Generally there are 2 ways to use a trained pipeline model - (offline) batch scoring, and real-time online scoring. For batch (or even "mini-batch" e.g. on Spark streaming data), then yes certainly loading the model back in Spark and feeding new data through the pipeline for prediction works just

Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
Hi Rishabh, I've just today had similar conversation about how to do a ML Pipeline deployment and couldn't really answer this question and more because I don't really understand the use case. What would you expect from ML Pipeline model deployment? You can save your model to a file by

Re: Deploying ML Pipeline Model

2016-07-01 Thread Silvio Fiorito
, Silvio From: Rishabh Bhardwaj <rbnex...@gmail.com> Date: Friday, July 1, 2016 at 7:54 AM To: user <user@spark.apache.org> Cc: "d...@spark.apache.org" <d...@spark.apache.org> Subject: Deploying ML Pipeline Model Hi All, I am looking for ways to deploy a ML Pipeline mod

Re: Deploying ML Pipeline Model

2016-07-01 Thread Steve Goodman
Hi Rishabh, I have a similar use-case and have struggled to find the best solution. As I understand it 1.6 provides pipeline persistence in Scala, and that will be expanded in 2.x. This project https://github.com/jpmml/jpmml-sparkml claims to support about a dozen pipeline transformers, and 6 or

Deploying ML Pipeline Model

2016-07-01 Thread Rishabh Bhardwaj
Hi All, I am looking for ways to deploy a ML Pipeline model in production . Spark has already proved to be a one of the best framework for model training and creation, but once the ml pipeline model is ready how can I deploy it outside spark context ? MLlib model has toPMML method but today