For multi-class you can use the same SVMWithSGD (for binary classification) with One-vs-All approach constructing respective training corpuses consisting one Class i as positive samples and Rest of the classes as negative one, and then use the same method provided by Aris as a measure of how far Class i is from the decision boundary.
On Wed, Sep 24, 2014 at 4:06 PM, Aris <arisofala...@gmail.com> wrote: > Χαίρε Αδαμάντιε Κοραή....έαν είναι πράγματι το όνομα σου.. > > Just to follow up on Liquan, you might be interested in removing the > thresholds, and then treating the predictions as a probability from 0..1 > inclusive. SVM with the linear kernel is a straightforward linear > classifier -- so you with the model.clearThreshold() you can just get the > raw predicted scores, removing the threshold which simple translates that > into a positive/negative class. > > API is here > http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel > > Enjoy! > Aris > > On Sun, Sep 21, 2014 at 11:50 PM, Liquan Pei <liquan...@gmail.com> wrote: > >> HI Adamantios, >> >> For your first question, after you train the SVM, you get a model with a >> vector of weights w and an intercept b, point x such that w.dot(x) + b = 1 >> and w.dot(x) + b = -1 are points that on the decision boundary. The >> quantity w.dot(x) + b for point x is a confidence measure of >> classification. >> >> Code wise, suppose you trained your model via >> val model = SVMWithSGD.train(...) >> >> and you can set a threshold by calling >> >> model.setThreshold(your threshold here) >> >> to set the threshold that separate positive predictions from negative >> predictions. >> >> For more info, please take a look at >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel >> >> For your second question, SVMWithSGD only supports binary classification. >> >> Hope this helps, >> >> Liquan >> >> On Sun, Sep 21, 2014 at 11:22 PM, Adamantios Corais < >> adamantios.cor...@gmail.com> wrote: >> >>> Nobody? >>> >>> If that's not supported already, can please, at least, give me a few >>> hints on how to implement it? >>> >>> Thanks! >>> >>> >>> On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais < >>> adamantios.cor...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am working with the SVMWithSGD classification algorithm on Spark. It >>>> works fine for me, however, I would like to recognize the instances that >>>> are classified with a high confidence from those with a low one. How do we >>>> define the threshold here? Ultimately, I want to keep only those for which >>>> the algorithm is very *very* certain about its its decision! How to do >>>> that? Is this feature supported already by any MLlib algorithm? What if I >>>> had multiple categories? >>>> >>>> Any input is highly appreciated! >>>> >>> >>> >> >> >> -- >> Liquan Pei >> Department of Physics >> University of Massachusetts Amherst >> > >