Revisiting Online serving of Spark models?

Holden Karau Wed, 09 May 2018 07:18:50 -0700

Hi y'all,

With the renewed interest in ML in Apache Spark now seems like a good a
time as any to revisit the online serving situation in Spark ML. DB &
other's have done some excellent working moving a lot of the necessary
tools into a local linear algebra package that doesn't depend on having a
SparkContext.


There are a few different commercial and non-commercial solutions round
this, but currently our individual transform/predict methods are private so
they either need to copy or re-implement (or put them selves in
org.apache.spark) to access them. How would folks feel about adding a new
trait for ML pipeline stages to expose to do transformation of single
element inputs (or local collections) that could be optionally implemented
by stages which support this? That way we can have less copy and paste code
possibly getting out of sync with our model training.

I think continuing to have on-line serving grow in different projects is
probably the right path, forward (folks have different needs), but I'd love
to see us make it simpler for other projects to build reliable serving
tools.

I realize this maybe puts some of the folks in an awkward position with
their own commercial offerings, but hopefully if we make it easier for
everyone the commercial vendors can benefit as well.

Cheers,

Holden :)

-- 
Twitter: https://twitter.com/holdenkarau

Revisiting Online serving of Spark models?

Reply via email to