Re: Serving Spark ML models via a regular Python web app

Nicholas Chammas Thu, 11 Aug 2016 07:43:27 -0700

Thanks Michael for the reference, and thanks Nick for the comprehensive
overview of existing JIRA discussions about this. I've added myself as a
watcher on the various tasks.


On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <[email protected]>
wrote:

> Currently there is no direct way in Spark to serve models without bringing
> in all of Spark as a dependency.
>
> For Spark ML, there is actually no way to do it independently of
> DataFrames either (which for single-instance prediction makes things
> sub-optimal). That is covered here:
> https://issues.apache.org/jira/browse/SPARK-10413
>
> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll
> your own". Or you can try to export to some other format such as PMML or
> PFA. Some MLlib models support PMML export, but for ML it is still missing
> (see https://issues.apache.org/jira/browse/SPARK-11171).
>
> There is an external project for PMML too (note licensing) -
> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite
> comprehensive. It shows that PMML can represent a pretty large subset of
> typical ML pipeline functionality.
>
> On the Python side sadly there is even less - I would say your options are
> pretty much "roll your own" currently, or export in PMML or PFA.
>
> Finally, part of the "mllib-local" idea was around enabling this local
> model-serving (for some initial discussion about the future see
> https://issues.apache.org/jira/browse/SPARK-16365).
>
> N
>
>
> On Thu, 11 Aug 2016 at 06:28 Michael Allman <[email protected]> wrote:
>
>> Nick,
>>
>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but
>> we use it in production to serve a random forest model trained by a Spark
>> ML pipeline.
>>
>> Thanks,
>>
>> Michael
>>
>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <[email protected]>
>> wrote:
>>
>> Are there any existing JIRAs covering the possibility of serving up Spark
>> ML models via, for example, a regular Python web app?
>>
>> The story goes like this: You train your model with Spark on several TB
>> of data, and now you want to use it in a prediction service that you’re
>> building, say with Flask <http://flask.pocoo.org/>. In principle, you
>> don’t need Spark anymore since you’re just passing individual data points
>> to your model and looking for it to spit some prediction back.
>>
>> I assume this is something people do today, right? I presume Spark needs
>> to run in their web service to serve up the model. (Sorry, I’m new to the
>> ML side of Spark. 😅)
>>
>> Are there any JIRAs discussing potential improvements to this story? I
>> did a search, but I’m not sure what exactly to look for. SPARK-4587
>> <https://issues.apache.org/jira/browse/SPARK-4587> (model import/export)
>> looks relevant, but doesn’t address the story directly.
>>
>> Nick
>> 
>>
>>
>>

Re: Serving Spark ML models via a regular Python web app

Reply via email to