Hi Cristian,
There's a jira (https://issues.apache.org/jira/browse/SPARK-16365) where
this issue has been discussed as well. I feel very strongly about the need
for this feature. I've been implementing local versions of transformers as
needed, which has made working with Spark ml much less
Thanks for the feedback.
If we strip away all of the fancy stuff, my proposal boils down to exposing
the logic used in Spark's ML library. In an ideal world, Spark would
possibly have relied on an existing ML implementation rather than
reimplement, since there's very little that's Spark specific
Although I love the cool idea of Asher, I'd rather +1 for Sean's view; I
think it would be much better to live outside of the project.
Best,
Dongjin
On Mon, Mar 13, 2017 at 5:39 PM, Sean Owen wrote:
> I'm skeptical. Serving synchronous queries from a model at scale is a
>
I'm skeptical. Serving synchronous queries from a model at scale is a
fundamentally different activity. As you note, it doesn't logically involve
Spark. If it has to happen in milliseconds it's going to be in-core.
Scoring even 10qps with a Spark job per request is probably a non-starter;
think
Great idea. I see the same problem.
I would suggest checking the following projects as a kick start as well (
not only mleap)
https://github.com/ucbrise/clipper and
https://github.com/Hydrospheredata/mist
Regards Georg
Asher Krim schrieb am So. 12. März 2017 um 23:21:
> Hi
Hi All,
I spent a lot of time at Spark Summit East this year talking with Spark
developers and committers about challenges with productizing Spark. One of
the biggest shortcomings I've encountered in Spark ML pipelines is the lack
of a way to serve single requests with any reasonable performance.