Hi Chris,

I was just checking out your project. I mentioned we use MLeap to serve 
predictions from a trained Spark ML RandomForest model. How would I do that 
with pipeline.io <http://pipeline.io/>? It isn't clear to me.

Thanks!

Michael

> On Aug 11, 2016, at 9:42 AM, Chris Fregly <ch...@fregly.com> wrote:
> 
> And here's a recent slide deck on the pipeline.io <http://pipeline.io/> that 
> summarizes what we're working on (all open source):  
> 
> https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production
>  
> <https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production>
> 
> mleap is heading the wrong direction and reinventing the wheel.  not quite 
> sure where that project will go.  doesn't seem like it will have a long 
> shelf-life in my opinion.
> 
> check out pipeline.io <http://pipeline.io/>.  some cool stuff in there.
> 
> On Aug 11, 2016, at 9:35 AM, Chris Fregly <ch...@fregly.com 
> <mailto:ch...@fregly.com>> wrote:
> 
>> this is exactly what my http://pipeline.io <http://pipeline.io/> project is 
>> addressing.  check it out and send me feedback or create issues at that 
>> github location.
>> 
>> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.cham...@gmail.com 
>> <mailto:nicholas.cham...@gmail.com>> wrote:
>> 
>>> Thanks Michael for the reference, and thanks Nick for the comprehensive 
>>> overview of existing JIRA discussions about this. I've added myself as a 
>>> watcher on the various tasks.
>>> 
>>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentre...@gmail.com 
>>> <mailto:nick.pentre...@gmail.com>> wrote:
>>> Currently there is no direct way in Spark to serve models without bringing 
>>> in all of Spark as a dependency.
>>> 
>>> For Spark ML, there is actually no way to do it independently of DataFrames 
>>> either (which for single-instance prediction makes things sub-optimal). 
>>> That is covered here: https://issues.apache.org/jira/browse/SPARK-10413 
>>> <https://issues.apache.org/jira/browse/SPARK-10413>
>>> 
>>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll 
>>> your own". Or you can try to export to some other format such as PMML or 
>>> PFA. Some MLlib models support PMML export, but for ML it is still missing 
>>> (see https://issues.apache.org/jira/browse/SPARK-11171 
>>> <https://issues.apache.org/jira/browse/SPARK-11171>).
>>> 
>>> There is an external project for PMML too (note licensing) - 
>>> https://github.com/jpmml/jpmml-sparkml 
>>> <https://github.com/jpmml/jpmml-sparkml> - which is by now actually quite 
>>> comprehensive. It shows that PMML can represent a pretty large subset of 
>>> typical ML pipeline functionality.
>>> 
>>> On the Python side sadly there is even less - I would say your options are 
>>> pretty much "roll your own" currently, or export in PMML or PFA.
>>> 
>>> Finally, part of the "mllib-local" idea was around enabling this local 
>>> model-serving (for some initial discussion about the future see 
>>> https://issues.apache.org/jira/browse/SPARK-16365 
>>> <https://issues.apache.org/jira/browse/SPARK-16365>).
>>> 
>>> N
>>> 
>>> 
>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com 
>>> <mailto:mich...@videoamp.com>> wrote:
>>> Nick,
>>> 
>>> Check out MLeap: https://github.com/TrueCar/mleap 
>>> <https://github.com/TrueCar/mleap>. It's not python, but we use it in 
>>> production to serve a random forest model trained by a Spark ML pipeline.
>>> 
>>> Thanks,
>>> 
>>> Michael
>>> 
>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com 
>>>> <mailto:nicholas.cham...@gmail.com>> wrote:
>>>> 
>>>> Are there any existing JIRAs covering the possibility of serving up Spark 
>>>> ML models via, for example, a regular Python web app?
>>>> 
>>>> The story goes like this: You train your model with Spark on several TB of 
>>>> data, and now you want to use it in a prediction service that you’re 
>>>> building, say with Flask <http://flask.pocoo.org/>. In principle, you 
>>>> don’t need Spark anymore since you’re just passing individual data points 
>>>> to your model and looking for it to spit some prediction back.
>>>> 
>>>> I assume this is something people do today, right? I presume Spark needs 
>>>> to run in their web service to serve up the model. (Sorry, I’m new to the 
>>>> ML side of Spark. 😅)
>>>> 
>>>> Are there any JIRAs discussing potential improvements to this story? I did 
>>>> a search, but I’m not sure what exactly to look for. SPARK-4587 
>>>> <https://issues.apache.org/jira/browse/SPARK-4587> (model import/export) 
>>>> looks relevant, but doesn’t address the story directly.
>>>> 
>>>> Nick
>>>> 
>>> 

Reply via email to