Thanks Michael for the reference, and thanks Nick for the comprehensive overview of existing JIRA discussions about this. I've added myself as a watcher on the various tasks.
On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com> wrote: > Currently there is no direct way in Spark to serve models without bringing > in all of Spark as a dependency. > > For Spark ML, there is actually no way to do it independently of > DataFrames either (which for single-instance prediction makes things > sub-optimal). That is covered here: > https://issues.apache.org/jira/browse/SPARK-10413 > > So, your options are (in Scala) things like MLeap, PredictionIO, or "roll > your own". Or you can try to export to some other format such as PMML or > PFA. Some MLlib models support PMML export, but for ML it is still missing > (see https://issues.apache.org/jira/browse/SPARK-11171). > > There is an external project for PMML too (note licensing) - > https://github.com/jpmml/jpmml-sparkml - which is by now actually quite > comprehensive. It shows that PMML can represent a pretty large subset of > typical ML pipeline functionality. > > On the Python side sadly there is even less - I would say your options are > pretty much "roll your own" currently, or export in PMML or PFA. > > Finally, part of the "mllib-local" idea was around enabling this local > model-serving (for some initial discussion about the future see > https://issues.apache.org/jira/browse/SPARK-16365). > > N > > > On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote: > >> Nick, >> >> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but >> we use it in production to serve a random forest model trained by a Spark >> ML pipeline. >> >> Thanks, >> >> Michael >> >> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com> >> wrote: >> >> Are there any existing JIRAs covering the possibility of serving up Spark >> ML models via, for example, a regular Python web app? >> >> The story goes like this: You train your model with Spark on several TB >> of data, and now you want to use it in a prediction service that you’re >> building, say with Flask <http://flask.pocoo.org/>. In principle, you >> don’t need Spark anymore since you’re just passing individual data points >> to your model and looking for it to spit some prediction back. >> >> I assume this is something people do today, right? I presume Spark needs >> to run in their web service to serve up the model. (Sorry, I’m new to the >> ML side of Spark. 😅) >> >> Are there any JIRAs discussing potential improvements to this story? I >> did a search, but I’m not sure what exactly to look for. SPARK-4587 >> <https://issues.apache.org/jira/browse/SPARK-4587> (model import/export) >> looks relevant, but doesn’t address the story directly. >> >> Nick >> >> >> >>