And here's a recent slide deck on the pipeline.io that summarizes what we're working on (all open source):
https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production mleap is heading the wrong direction and reinventing the wheel. not quite sure where that project will go. doesn't seem like it will have a long shelf-life in my opinion. check out pipeline.io. some cool stuff in there. > On Aug 11, 2016, at 9:35 AM, Chris Fregly <ch...@fregly.com> wrote: > > this is exactly what my http://pipeline.io project is addressing. check it > out and send me feedback or create issues at that github location. > >> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com> >> wrote: >> >> Thanks Michael for the reference, and thanks Nick for the comprehensive >> overview of existing JIRA discussions about this. I've added myself as a >> watcher on the various tasks. >> >>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com> >>> wrote: >>> Currently there is no direct way in Spark to serve models without bringing >>> in all of Spark as a dependency. >>> >>> For Spark ML, there is actually no way to do it independently of DataFrames >>> either (which for single-instance prediction makes things sub-optimal). >>> That is covered here: https://issues.apache.org/jira/browse/SPARK-10413 >>> >>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll >>> your own". Or you can try to export to some other format such as PMML or >>> PFA. Some MLlib models support PMML export, but for ML it is still missing >>> (see https://issues.apache.org/jira/browse/SPARK-11171). >>> >>> There is an external project for PMML too (note licensing) - >>> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite >>> comprehensive. It shows that PMML can represent a pretty large subset of >>> typical ML pipeline functionality. >>> >>> On the Python side sadly there is even less - I would say your options are >>> pretty much "roll your own" currently, or export in PMML or PFA. >>> >>> Finally, part of the "mllib-local" idea was around enabling this local >>> model-serving (for some initial discussion about the future see >>> https://issues.apache.org/jira/browse/SPARK-16365). >>> >>> N >>> >>> >>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote: >>>> Nick, >>>> >>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we >>>> use it in production to serve a random forest model trained by a Spark ML >>>> pipeline. >>>> >>>> Thanks, >>>> >>>> Michael >>>> >>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas >>>>> <nicholas.cham...@gmail.com> wrote: >>>>> >>>>> Are there any existing JIRAs covering the possibility of serving up Spark >>>>> ML models via, for example, a regular Python web app? >>>>> >>>>> The story goes like this: You train your model with Spark on several TB >>>>> of data, and now you want to use it in a prediction service that you’re >>>>> building, say with Flask. In principle, you don’t need Spark anymore >>>>> since you’re just passing individual data points to your model and >>>>> looking for it to spit some prediction back. >>>>> >>>>> I assume this is something people do today, right? I presume Spark needs >>>>> to run in their web service to serve up the model. (Sorry, I’m new to the >>>>> ML side of Spark. 😅) >>>>> >>>>> Are there any JIRAs discussing potential improvements to this story? I >>>>> did a search, but I’m not sure what exactly to look for. SPARK-4587 >>>>> (model import/export) looks relevant, but doesn’t address the story >>>>> directly. >>>>> >>>>> Nick