How the model is built isn't that related to how it scores things.
Here we're just talking about scoring. NaiveBayesModel can score
Vector which is not a distributed entity. That's what you want to use.
You do not want to use a whole distributed operation to score one
record. This isn't related to .ml vs .mllib APIs.

On Thu, Sep 1, 2016 at 2:01 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
> I understand your point.
>
> Is there something like a bridge? Is it possible to convert the model
> trained using Dataset<Row> (i.e. the distributed one) to the one which uses
> vectors? In Spark 1.6 the mllib packages had everything as per vectors and
> that should be faster as per my understanding. But in many spark blogs we
> saw that spark is moving towards the ml package and mllib package will be
> phased out. So how can someone train using huge data and then use it on a
> row by row basis?
>
> Thanks for your inputs.
>
> On Thu, Sep 1, 2016 at 6:15 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> If you're trying to score a single example by way of an RDD or
>> Dataset, then no it will never be that fast. It's a whole distributed
>> operation, and while you might manage low latency for one job at a
>> time, consider what will happen when hundreds of them are running at
>> once. It's just huge overkill for scoring a single example (but,
>> pretty fine for high-er latency, high throughput batch operations)
>>
>> However if you're scoring a Vector locally I can't imagine it's that
>> slow. It does some linear algebra but it's not that complicated. Even
>> something unoptimized should be fast.
>>
>> On Thu, Sep 1, 2016 at 1:37 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
>> > Hi
>> >
>> > Currently trying to use NaiveBayes to make predictions. But facing
>> > issues
>> > that doing the predictions takes order of few seconds. I tried with
>> > other
>> > model examples shipped with Spark but they also ran in minimum of 500 ms
>> > when I used Scala API. With
>> >
>> > Has anyone used spark ML to do predictions for a single row under 20 ms?
>> >
>> > I am not doing premature optimization. The use case is that we are doing
>> > real time predictions and we need results 20ms. Maximum 30ms. This is a
>> > hard
>> > limit for our use case.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to