How the model is built isn't that related to how it scores things. Here we're just talking about scoring. NaiveBayesModel can score Vector which is not a distributed entity. That's what you want to use. You do not want to use a whole distributed operation to score one record. This isn't related to .ml vs .mllib APIs.
On Thu, Sep 1, 2016 at 2:01 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > I understand your point. > > Is there something like a bridge? Is it possible to convert the model > trained using Dataset<Row> (i.e. the distributed one) to the one which uses > vectors? In Spark 1.6 the mllib packages had everything as per vectors and > that should be faster as per my understanding. But in many spark blogs we > saw that spark is moving towards the ml package and mllib package will be > phased out. So how can someone train using huge data and then use it on a > row by row basis? > > Thanks for your inputs. > > On Thu, Sep 1, 2016 at 6:15 PM, Sean Owen <so...@cloudera.com> wrote: >> >> If you're trying to score a single example by way of an RDD or >> Dataset, then no it will never be that fast. It's a whole distributed >> operation, and while you might manage low latency for one job at a >> time, consider what will happen when hundreds of them are running at >> once. It's just huge overkill for scoring a single example (but, >> pretty fine for high-er latency, high throughput batch operations) >> >> However if you're scoring a Vector locally I can't imagine it's that >> slow. It does some linear algebra but it's not that complicated. Even >> something unoptimized should be fast. >> >> On Thu, Sep 1, 2016 at 1:37 PM, Aseem Bansal <asmbans...@gmail.com> wrote: >> > Hi >> > >> > Currently trying to use NaiveBayes to make predictions. But facing >> > issues >> > that doing the predictions takes order of few seconds. I tried with >> > other >> > model examples shipped with Spark but they also ran in minimum of 500 ms >> > when I used Scala API. With >> > >> > Has anyone used spark ML to do predictions for a single row under 20 ms? >> > >> > I am not doing premature optimization. The use case is that we are doing >> > real time predictions and we need results 20ms. Maximum 30ms. This is a >> > hard >> > limit for our use case. > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org