this is exactly what my http://pipeline.io project is addressing.  check it out 
and send me feedback or create issues at that github location.

> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com> 
> wrote:
> 
> Thanks Michael for the reference, and thanks Nick for the comprehensive 
> overview of existing JIRA discussions about this. I've added myself as a 
> watcher on the various tasks.
> 
>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com> 
>> wrote:
>> Currently there is no direct way in Spark to serve models without bringing 
>> in all of Spark as a dependency.
>> 
>> For Spark ML, there is actually no way to do it independently of DataFrames 
>> either (which for single-instance prediction makes things sub-optimal). That 
>> is covered here: https://issues.apache.org/jira/browse/SPARK-10413
>> 
>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll 
>> your own". Or you can try to export to some other format such as PMML or 
>> PFA. Some MLlib models support PMML export, but for ML it is still missing 
>> (see https://issues.apache.org/jira/browse/SPARK-11171).
>> 
>> There is an external project for PMML too (note licensing) - 
>> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite 
>> comprehensive. It shows that PMML can represent a pretty large subset of 
>> typical ML pipeline functionality.
>> 
>> On the Python side sadly there is even less - I would say your options are 
>> pretty much "roll your own" currently, or export in PMML or PFA.
>> 
>> Finally, part of the "mllib-local" idea was around enabling this local 
>> model-serving (for some initial discussion about the future see 
>> https://issues.apache.org/jira/browse/SPARK-16365).
>> 
>> N
>> 
>> 
>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote:
>>> Nick,
>>> 
>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we 
>>> use it in production to serve a random forest model trained by a Spark ML 
>>> pipeline.
>>> 
>>> Thanks,
>>> 
>>> Michael
>>> 
>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Are there any existing JIRAs covering the possibility of serving up Spark 
>>>> ML models via, for example, a regular Python web app?
>>>> 
>>>> The story goes like this: You train your model with Spark on several TB of 
>>>> data, and now you want to use it in a prediction service that you’re 
>>>> building, say with Flask. In principle, you don’t need Spark anymore since 
>>>> you’re just passing individual data points to your model and looking for 
>>>> it to spit some prediction back.
>>>> 
>>>> I assume this is something people do today, right? I presume Spark needs 
>>>> to run in their web service to serve up the model. (Sorry, I’m new to the 
>>>> ML side of Spark. 😅)
>>>> 
>>>> Are there any JIRAs discussing potential improvements to this story? I did 
>>>> a search, but I’m not sure what exactly to look for. SPARK-4587 (model 
>>>> import/export) looks relevant, but doesn’t address the story directly.
>>>> 
>>>> Nick

Reply via email to