Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-02 Thread Aseem Bansal
Hi Thanks for all the details. I was able to convert from ml.NaiveBayesModel to mllib.NaiveBayesModel and get it done. It is fast for our use case. Just one question. Before mllib is removed can ml package be expected to reach feature parity with mllib? On Thu, Sep 1, 2016 at 7:12 PM, Sean Owen

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Sean Owen
Yeah there's a method to predict one Vector in the .mllib API but not the newer one. You could possibly hack your way into calling it anyway, or just clone the logic. On Thu, Sep 1, 2016 at 2:37 PM, Nick Pentreath wrote: > Right now you are correct that Spark ML APIs do

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Nick Pentreath
I should also point out that right now your only option is to code up your own export functionality (or be able to read Spark's format in your serving system), and translate that into the correct format for some other linear algebra or ML library, and use that for serving. On Thu, 1 Sep 2016 at

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Nick Pentreath
Right now you are correct that Spark ML APIs do not support predicting on a single instance (whether Vector for the models or a Row for a pipeline). See https://issues.apache.org/jira/browse/SPARK-10413 and https://issues.apache.org/jira/browse/SPARK-16431 (duplicate) for some discussion. There

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Aseem Bansal
I understand from a theoretical perspective that the model itself is not distributed. Thus it can be used for making predictions for a vector or a RDD. But speaking in terms of the APIs provided by spark 2.0.0 when I create a model from a large data the recommended way is to use the ml library for

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Sean Owen
How the model is built isn't that related to how it scores things. Here we're just talking about scoring. NaiveBayesModel can score Vector which is not a distributed entity. That's what you want to use. You do not want to use a whole distributed operation to score one record. This isn't related to

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Aseem Bansal
I understand your point. Is there something like a bridge? Is it possible to convert the model trained using Dataset (i.e. the distributed one) to the one which uses vectors? In Spark 1.6 the mllib packages had everything as per vectors and that should be faster as per my understanding. But in

Re: Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Sean Owen
If you're trying to score a single example by way of an RDD or Dataset, then no it will never be that fast. It's a whole distributed operation, and while you might manage low latency for one job at a time, consider what will happen when hundreds of them are running at once. It's just huge overkill

Spark 2.0.0 - has anyone used spark ML to do predictions under 20ms?

2016-09-01 Thread Aseem Bansal
Hi Currently trying to use NaiveBayes to make predictions. But facing issues that doing the predictions takes order of few seconds. I tried with other model examples shipped with Spark but they also ran in minimum of 500 ms when I used Scala API. With Has anyone used spark ML to do predictions