Thanks for the additional reference Chris. Sounds like there are a few independent projects addressing this story.
On Thu, Aug 11, 2016 at 12:42 PM Chris Fregly <[email protected]> wrote: > And here's a recent slide deck on the pipeline.io that summarizes what > we're working on (all open source): > > > https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production > > mleap is heading the wrong direction and reinventing the wheel. not quite > sure where that project will go. doesn't seem like it will have a long > shelf-life in my opinion. > > check out pipeline.io. some cool stuff in there. > > On Aug 11, 2016, at 9:35 AM, Chris Fregly <[email protected]> wrote: > > this is exactly what my http://pipeline.io project is addressing. check > it out and send me feedback or create issues at that github location. > > On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <[email protected]> > wrote: > > Thanks Michael for the reference, and thanks Nick for the comprehensive > overview of existing JIRA discussions about this. I've added myself as a > watcher on the various tasks. > > On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <[email protected]> > wrote: > >> Currently there is no direct way in Spark to serve models without >> bringing in all of Spark as a dependency. >> >> For Spark ML, there is actually no way to do it independently of >> DataFrames either (which for single-instance prediction makes things >> sub-optimal). That is covered here: >> https://issues.apache.org/jira/browse/SPARK-10413 >> >> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll >> your own". Or you can try to export to some other format such as PMML or >> PFA. Some MLlib models support PMML export, but for ML it is still missing >> (see https://issues.apache.org/jira/browse/SPARK-11171). >> >> There is an external project for PMML too (note licensing) - >> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite >> comprehensive. It shows that PMML can represent a pretty large subset of >> typical ML pipeline functionality. >> >> On the Python side sadly there is even less - I would say your options >> are pretty much "roll your own" currently, or export in PMML or PFA. >> >> Finally, part of the "mllib-local" idea was around enabling this local >> model-serving (for some initial discussion about the future see >> https://issues.apache.org/jira/browse/SPARK-16365). >> >> N >> >> >> On Thu, 11 Aug 2016 at 06:28 Michael Allman <[email protected]> wrote: >> >>> Nick, >>> >>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but >>> we use it in production to serve a random forest model trained by a Spark >>> ML pipeline. >>> >>> Thanks, >>> >>> Michael >>> >>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas < >>> [email protected]> wrote: >>> >>> Are there any existing JIRAs covering the possibility of serving up >>> Spark ML models via, for example, a regular Python web app? >>> >>> The story goes like this: You train your model with Spark on several TB >>> of data, and now you want to use it in a prediction service that you’re >>> building, say with Flask <http://flask.pocoo.org/>. In principle, you >>> don’t need Spark anymore since you’re just passing individual data points >>> to your model and looking for it to spit some prediction back. >>> >>> I assume this is something people do today, right? I presume Spark needs >>> to run in their web service to serve up the model. (Sorry, I’m new to the >>> ML side of Spark. 😅) >>> >>> Are there any JIRAs discussing potential improvements to this story? I >>> did a search, but I’m not sure what exactly to look for. SPARK-4587 >>> <https://issues.apache.org/jira/browse/SPARK-4587> (model >>> import/export) looks relevant, but doesn’t address the story directly. >>> >>> Nick >>> >>> >>> >>>
