Hi Nick, Thanks for the answer. Do you think an implementation like the one in this article is infeasible in production for say, hundreds of queries per minute? https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2. The article uses Flask to define routes and Spark for evaluating requests.
Regards, Saurabh On Fri, Jul 1, 2016 at 10:47 AM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Generally there are 2 ways to use a trained pipeline model - (offline) > batch scoring, and real-time online scoring. > > For batch (or even "mini-batch" e.g. on Spark streaming data), then yes > certainly loading the model back in Spark and feeding new data through the > pipeline for prediction works just fine, and this is essentially what is > supported in 1.6 (and more or less full coverage in 2.0). For large batch > cases this can be quite efficient. > > However, usually for real-time use cases, the latency required is fairly > low - of the order of a few ms to a few 100ms for a request (some examples > include recommendations, ad-serving, fraud detection etc). > > In these cases, using Spark has 2 issues: (1) latency for prediction on > the pipeline, which is based on DataFrames and therefore distributed > execution, is usually fairly high "per request"; (2) this requires pulling > in all of Spark for your real-time serving layer (or running a full Spark > cluster), which is usually way too much overkill - all you really need for > serving is a bit of linear algebra and some basic transformations. > > So for now, unfortunately there is not much in the way of options for > exporting your pipelines and serving them outside of Spark - the > JPMML-based project mentioned on this thread is one option. The other > option at this point is to write your own export functionality and your own > serving layer. > > There is (very initial) movement towards improving the local serving > possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which > was the "first step" in this process). > > On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi Rishabh, >> >> I've just today had similar conversation about how to do a ML Pipeline >> deployment and couldn't really answer this question and more because I >> don't really understand the use case. >> >> What would you expect from ML Pipeline model deployment? You can save >> your model to a file by model.write.overwrite.save("model_v1"). >> >> model_v1 >> |-- metadata >> | |-- _SUCCESS >> | `-- part-00000 >> `-- stages >> |-- 0_regexTok_b4265099cc1c >> | `-- metadata >> | |-- _SUCCESS >> | `-- part-00000 >> |-- 1_hashingTF_8de997cf54ba >> | `-- metadata >> | |-- _SUCCESS >> | `-- part-00000 >> `-- 2_linReg_3942a71d2c0e >> |-- data >> | |-- _SUCCESS >> | |-- _common_metadata >> | |-- _metadata >> | `-- >> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet >> `-- metadata >> |-- _SUCCESS >> `-- part-00000 >> >> 9 directories, 12 files >> >> What would you like to have outside SparkContext? What's wrong with >> using Spark? Just curious hoping to understand the use case better. >> Thanks. >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnex...@gmail.com> >> wrote: >> > Hi All, >> > >> > I am looking for ways to deploy a ML Pipeline model in production . >> > Spark has already proved to be a one of the best framework for model >> > training and creation, but once the ml pipeline model is ready how can I >> > deploy it outside spark context ? >> > MLlib model has toPMML method but today Pipeline model can not be saved >> to >> > PMML. There are some frameworks like MLeap which are trying to abstract >> > Pipeline Model and provide ML Pipeline Model deployment outside spark >> > context,but currently they don't have most of the ml transformers and >> > estimators. >> > I am looking for related work going on this area. >> > Any pointers will be helpful. >> > >> > Thanks, >> > Rishabh. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>