Hi y'all, With the renewed interest in ML in Apache Spark now seems like a good a time as any to revisit the online serving situation in Spark ML. DB & other's have done some excellent working moving a lot of the necessary tools into a local linear algebra package that doesn't depend on having a SparkContext.
There are a few different commercial and non-commercial solutions round this, but currently our individual transform/predict methods are private so they either need to copy or re-implement (or put them selves in org.apache.spark) to access them. How would folks feel about adding a new trait for ML pipeline stages to expose to do transformation of single element inputs (or local collections) that could be optionally implemented by stages which support this? That way we can have less copy and paste code possibly getting out of sync with our model training. I think continuing to have on-line serving grow in different projects is probably the right path, forward (folks have different needs), but I'd love to see us make it simpler for other projects to build reliable serving tools. I realize this maybe puts some of the folks in an awkward position with their own commercial offerings, but hopefully if we make it easier for everyone the commercial vendors can benefit as well. Cheers, Holden :) -- Twitter: https://twitter.com/holdenkarau