Thanks for the additional reference Chris. Sounds like there are a few
independent projects addressing this story.

On Thu, Aug 11, 2016 at 12:42 PM Chris Fregly <ch...@fregly.com> wrote:

> And here's a recent slide deck on the pipeline.io that summarizes what
> we're working on (all open source):
>
>
> https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production
>
> mleap is heading the wrong direction and reinventing the wheel.  not quite
> sure where that project will go.  doesn't seem like it will have a long
> shelf-life in my opinion.
>
> check out pipeline.io.  some cool stuff in there.
>
> On Aug 11, 2016, at 9:35 AM, Chris Fregly <ch...@fregly.com> wrote:
>
> this is exactly what my http://pipeline.io project is addressing.  check
> it out and send me feedback or create issues at that github location.
>
> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> Thanks Michael for the reference, and thanks Nick for the comprehensive
> overview of existing JIRA discussions about this. I've added myself as a
> watcher on the various tasks.
>
> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
>> Currently there is no direct way in Spark to serve models without
>> bringing in all of Spark as a dependency.
>>
>> For Spark ML, there is actually no way to do it independently of
>> DataFrames either (which for single-instance prediction makes things
>> sub-optimal). That is covered here:
>> https://issues.apache.org/jira/browse/SPARK-10413
>>
>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll
>> your own". Or you can try to export to some other format such as PMML or
>> PFA. Some MLlib models support PMML export, but for ML it is still missing
>> (see https://issues.apache.org/jira/browse/SPARK-11171).
>>
>> There is an external project for PMML too (note licensing) -
>> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite
>> comprehensive. It shows that PMML can represent a pretty large subset of
>> typical ML pipeline functionality.
>>
>> On the Python side sadly there is even less - I would say your options
>> are pretty much "roll your own" currently, or export in PMML or PFA.
>>
>> Finally, part of the "mllib-local" idea was around enabling this local
>> model-serving (for some initial discussion about the future see
>> https://issues.apache.org/jira/browse/SPARK-16365).
>>
>> N
>>
>>
>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote:
>>
>>> Nick,
>>>
>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but
>>> we use it in production to serve a random forest model trained by a Spark
>>> ML pipeline.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>> Are there any existing JIRAs covering the possibility of serving up
>>> Spark ML models via, for example, a regular Python web app?
>>>
>>> The story goes like this: You train your model with Spark on several TB
>>> of data, and now you want to use it in a prediction service that you’re
>>> building, say with Flask <http://flask.pocoo.org/>. In principle, you
>>> don’t need Spark anymore since you’re just passing individual data points
>>> to your model and looking for it to spit some prediction back.
>>>
>>> I assume this is something people do today, right? I presume Spark needs
>>> to run in their web service to serve up the model. (Sorry, I’m new to the
>>> ML side of Spark. 😅)
>>>
>>> Are there any JIRAs discussing potential improvements to this story? I
>>> did a search, but I’m not sure what exactly to look for. SPARK-4587
>>> <https://issues.apache.org/jira/browse/SPARK-4587> (model
>>> import/export) looks relevant, but doesn’t address the story directly.
>>>
>>> Nick
>>> ​
>>>
>>>
>>>

Reply via email to