this is exactly what my http://pipeline.io project is addressing. check it out and send me feedback or create issues at that github location.
> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Thanks Michael for the reference, and thanks Nick for the comprehensive > overview of existing JIRA discussions about this. I've added myself as a > watcher on the various tasks. > >> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com> >> wrote: >> Currently there is no direct way in Spark to serve models without bringing >> in all of Spark as a dependency. >> >> For Spark ML, there is actually no way to do it independently of DataFrames >> either (which for single-instance prediction makes things sub-optimal). That >> is covered here: https://issues.apache.org/jira/browse/SPARK-10413 >> >> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll >> your own". Or you can try to export to some other format such as PMML or >> PFA. Some MLlib models support PMML export, but for ML it is still missing >> (see https://issues.apache.org/jira/browse/SPARK-11171). >> >> There is an external project for PMML too (note licensing) - >> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite >> comprehensive. It shows that PMML can represent a pretty large subset of >> typical ML pipeline functionality. >> >> On the Python side sadly there is even less - I would say your options are >> pretty much "roll your own" currently, or export in PMML or PFA. >> >> Finally, part of the "mllib-local" idea was around enabling this local >> model-serving (for some initial discussion about the future see >> https://issues.apache.org/jira/browse/SPARK-16365). >> >> N >> >> >>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote: >>> Nick, >>> >>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we >>> use it in production to serve a random forest model trained by a Spark ML >>> pipeline. >>> >>> Thanks, >>> >>> Michael >>> >>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com> >>>> wrote: >>>> >>>> Are there any existing JIRAs covering the possibility of serving up Spark >>>> ML models via, for example, a regular Python web app? >>>> >>>> The story goes like this: You train your model with Spark on several TB of >>>> data, and now you want to use it in a prediction service that you’re >>>> building, say with Flask. In principle, you don’t need Spark anymore since >>>> you’re just passing individual data points to your model and looking for >>>> it to spit some prediction back. >>>> >>>> I assume this is something people do today, right? I presume Spark needs >>>> to run in their web service to serve up the model. (Sorry, I’m new to the >>>> ML side of Spark. 😅) >>>> >>>> Are there any JIRAs discussing potential improvements to this story? I did >>>> a search, but I’m not sure what exactly to look for. SPARK-4587 (model >>>> import/export) looks relevant, but doesn’t address the story directly. >>>> >>>> Nick