And here's a recent slide deck on the pipeline.io that summarizes what we're 
working on (all open source):  

https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production

mleap is heading the wrong direction and reinventing the wheel.  not quite sure 
where that project will go.  doesn't seem like it will have a long shelf-life 
in my opinion.

check out pipeline.io.  some cool stuff in there.

> On Aug 11, 2016, at 9:35 AM, Chris Fregly <ch...@fregly.com> wrote:
> 
> this is exactly what my http://pipeline.io project is addressing.  check it 
> out and send me feedback or create issues at that github location.
> 
>> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com> 
>> wrote:
>> 
>> Thanks Michael for the reference, and thanks Nick for the comprehensive 
>> overview of existing JIRA discussions about this. I've added myself as a 
>> watcher on the various tasks.
>> 
>>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com> 
>>> wrote:
>>> Currently there is no direct way in Spark to serve models without bringing 
>>> in all of Spark as a dependency.
>>> 
>>> For Spark ML, there is actually no way to do it independently of DataFrames 
>>> either (which for single-instance prediction makes things sub-optimal). 
>>> That is covered here: https://issues.apache.org/jira/browse/SPARK-10413
>>> 
>>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll 
>>> your own". Or you can try to export to some other format such as PMML or 
>>> PFA. Some MLlib models support PMML export, but for ML it is still missing 
>>> (see https://issues.apache.org/jira/browse/SPARK-11171).
>>> 
>>> There is an external project for PMML too (note licensing) - 
>>> https://github.com/jpmml/jpmml-sparkml - which is by now actually quite 
>>> comprehensive. It shows that PMML can represent a pretty large subset of 
>>> typical ML pipeline functionality.
>>> 
>>> On the Python side sadly there is even less - I would say your options are 
>>> pretty much "roll your own" currently, or export in PMML or PFA.
>>> 
>>> Finally, part of the "mllib-local" idea was around enabling this local 
>>> model-serving (for some initial discussion about the future see 
>>> https://issues.apache.org/jira/browse/SPARK-16365).
>>> 
>>> N
>>> 
>>> 
>>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote:
>>>> Nick,
>>>> 
>>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we 
>>>> use it in production to serve a random forest model trained by a Spark ML 
>>>> pipeline.
>>>> 
>>>> Thanks,
>>>> 
>>>> Michael
>>>> 
>>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas 
>>>>> <nicholas.cham...@gmail.com> wrote:
>>>>> 
>>>>> Are there any existing JIRAs covering the possibility of serving up Spark 
>>>>> ML models via, for example, a regular Python web app?
>>>>> 
>>>>> The story goes like this: You train your model with Spark on several TB 
>>>>> of data, and now you want to use it in a prediction service that you’re 
>>>>> building, say with Flask. In principle, you don’t need Spark anymore 
>>>>> since you’re just passing individual data points to your model and 
>>>>> looking for it to spit some prediction back.
>>>>> 
>>>>> I assume this is something people do today, right? I presume Spark needs 
>>>>> to run in their web service to serve up the model. (Sorry, I’m new to the 
>>>>> ML side of Spark. 😅)
>>>>> 
>>>>> Are there any JIRAs discussing potential improvements to this story? I 
>>>>> did a search, but I’m not sure what exactly to look for. SPARK-4587 
>>>>> (model import/export) looks relevant, but doesn’t address the story 
>>>>> directly.
>>>>> 
>>>>> Nick

Reply via email to