Hi Chris, I was just checking out your project. I mentioned we use MLeap to serve predictions from a trained Spark ML RandomForest model. How would I do that with pipeline.io <http://pipeline.io/>? It isn't clear to me.
Thanks! Michael > On Aug 11, 2016, at 9:42 AM, Chris Fregly <ch...@fregly.com> wrote: > > And here's a recent slide deck on the pipeline.io <http://pipeline.io/> that > summarizes what we're working on (all open source): > > https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production > > <https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production> > > mleap is heading the wrong direction and reinventing the wheel. not quite > sure where that project will go. doesn't seem like it will have a long > shelf-life in my opinion. > > check out pipeline.io <http://pipeline.io/>. some cool stuff in there. > > On Aug 11, 2016, at 9:35 AM, Chris Fregly <ch...@fregly.com > <mailto:ch...@fregly.com>> wrote: > >> this is exactly what my http://pipeline.io <http://pipeline.io/> project is >> addressing. check it out and send me feedback or create issues at that >> github location. >> >> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com >> <mailto:nicholas.cham...@gmail.com>> wrote: >> >>> Thanks Michael for the reference, and thanks Nick for the comprehensive >>> overview of existing JIRA discussions about this. I've added myself as a >>> watcher on the various tasks. >>> >>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com >>> <mailto:nick.pentre...@gmail.com>> wrote: >>> Currently there is no direct way in Spark to serve models without bringing >>> in all of Spark as a dependency. >>> >>> For Spark ML, there is actually no way to do it independently of DataFrames >>> either (which for single-instance prediction makes things sub-optimal). >>> That is covered here: https://issues.apache.org/jira/browse/SPARK-10413 >>> <https://issues.apache.org/jira/browse/SPARK-10413> >>> >>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll >>> your own". Or you can try to export to some other format such as PMML or >>> PFA. Some MLlib models support PMML export, but for ML it is still missing >>> (see https://issues.apache.org/jira/browse/SPARK-11171 >>> <https://issues.apache.org/jira/browse/SPARK-11171>). >>> >>> There is an external project for PMML too (note licensing) - >>> https://github.com/jpmml/jpmml-sparkml >>> <https://github.com/jpmml/jpmml-sparkml> - which is by now actually quite >>> comprehensive. It shows that PMML can represent a pretty large subset of >>> typical ML pipeline functionality. >>> >>> On the Python side sadly there is even less - I would say your options are >>> pretty much "roll your own" currently, or export in PMML or PFA. >>> >>> Finally, part of the "mllib-local" idea was around enabling this local >>> model-serving (for some initial discussion about the future see >>> https://issues.apache.org/jira/browse/SPARK-16365 >>> <https://issues.apache.org/jira/browse/SPARK-16365>). >>> >>> N >>> >>> >>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com >>> <mailto:mich...@videoamp.com>> wrote: >>> Nick, >>> >>> Check out MLeap: https://github.com/TrueCar/mleap >>> <https://github.com/TrueCar/mleap>. It's not python, but we use it in >>> production to serve a random forest model trained by a Spark ML pipeline. >>> >>> Thanks, >>> >>> Michael >>> >>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com >>>> <mailto:nicholas.cham...@gmail.com>> wrote: >>>> >>>> Are there any existing JIRAs covering the possibility of serving up Spark >>>> ML models via, for example, a regular Python web app? >>>> >>>> The story goes like this: You train your model with Spark on several TB of >>>> data, and now you want to use it in a prediction service that you’re >>>> building, say with Flask <http://flask.pocoo.org/>. In principle, you >>>> don’t need Spark anymore since you’re just passing individual data points >>>> to your model and looking for it to spit some prediction back. >>>> >>>> I assume this is something people do today, right? I presume Spark needs >>>> to run in their web service to serve up the model. (Sorry, I’m new to the >>>> ML side of Spark. 😅) >>>> >>>> Are there any JIRAs discussing potential improvements to this story? I did >>>> a search, but I’m not sure what exactly to look for. SPARK-4587 >>>> <https://issues.apache.org/jira/browse/SPARK-4587> (model import/export) >>>> looks relevant, but doesn’t address the story directly. >>>> >>>> Nick >>>> >>>