Re: Spark Local Pipelines

2017-05-18 Thread Asher Krim
Hi Cristian, There's a jira (https://issues.apache.org/jira/browse/SPARK-16365) where this issue has been discussed as well. I feel very strongly about the need for this feature. I've been implementing local versions of transformers as needed, which has made working with Spark ml much less

Re: Spark Local Pipelines

2017-03-13 Thread Asher Krim
Thanks for the feedback. If we strip away all of the fancy stuff, my proposal boils down to exposing the logic used in Spark's ML library. In an ideal world, Spark would possibly have relied on an existing ML implementation rather than reimplement, since there's very little that's Spark specific

Re: Spark Local Pipelines

2017-03-13 Thread Dongjin Lee
Although I love the cool idea of Asher, I'd rather +1 for Sean's view; I think it would be much better to live outside of the project. Best, Dongjin On Mon, Mar 13, 2017 at 5:39 PM, Sean Owen wrote: > I'm skeptical. Serving synchronous queries from a model at scale is a >

Re: Spark Local Pipelines

2017-03-13 Thread Sean Owen
I'm skeptical. Serving synchronous queries from a model at scale is a fundamentally different activity. As you note, it doesn't logically involve Spark. If it has to happen in milliseconds it's going to be in-core. Scoring even 10qps with a Spark job per request is probably a non-starter; think

Re: Spark Local Pipelines

2017-03-13 Thread Georg Heiler
Great idea. I see the same problem. I would suggest checking the following projects as a kick start as well ( not only mleap) https://github.com/ucbrise/clipper and https://github.com/Hydrospheredata/mist Regards Georg Asher Krim schrieb am So. 12. März 2017 um 23:21: > Hi

Spark Local Pipelines

2017-03-12 Thread Asher Krim
Hi All, I spent a lot of time at Spark Summit East this year talking with Spark developers and committers about challenges with productizing Spark. One of the biggest shortcomings I've encountered in Spark ML pipelines is the lack of a way to serve single requests with any reasonable performance.