If you're trying to score a single example by way of an RDD or Dataset, then no it will never be that fast. It's a whole distributed operation, and while you might manage low latency for one job at a time, consider what will happen when hundreds of them are running at once. It's just huge overkill for scoring a single example (but, pretty fine for high-er latency, high throughput batch operations)
However if you're scoring a Vector locally I can't imagine it's that slow. It does some linear algebra but it's not that complicated. Even something unoptimized should be fast. On Thu, Sep 1, 2016 at 1:37 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > Hi > > Currently trying to use NaiveBayes to make predictions. But facing issues > that doing the predictions takes order of few seconds. I tried with other > model examples shipped with Spark but they also ran in minimum of 500 ms > when I used Scala API. With > > Has anyone used spark ML to do predictions for a single row under 20 ms? > > I am not doing premature optimization. The use case is that we are doing > real time predictions and we need results 20ms. Maximum 30ms. This is a hard > limit for our use case. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org