Re: MLLib SVM probability

Driesprong, Fokko Mon, 04 May 2015 06:30:20 -0700

Hi Robert,

I would say, taking the sign of the numbers represent the class of the
input-vector. What kind of data are you using, and what kind of traning-set
do you use. Fundamentally a SVM is able to separate only two classes, you
can do one vs the rest as you mentioned.


I don't see how LVQ can benefit the SVM classifier. I would say that this
is more a SVM problem, than a Spark.

2015-05-04 15:22 GMT+02:00 Robert Musters <robert.must...@openindex.io>:

>  Hi all,
>
> I am trying to understand the output of the SVM classifier.
>
> Right now, my output looks like this:
>
> -18.841544889249917 0.0
>
> 168.32916035523283 1.0
>
> 420.67763915879794 1.0
>
> -974.1942589201286 0.0
>
> 71.73602841256813 1.0
>
> 233.13636224524993 1.0
>
> -1000.5902168199027 0.0
>
>
>  The documentation is unclear about what these numbers mean
> <https://spark.apache.org/docs/0.9.2/api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint>
> .
>
> I think it is the distance to the hyperplane with sign.
>
>
>  My main question is: How can I convert distances from hyperplanes to
> probabilities in a multi-class one-vs-all approach?
>
> SVMLib <http://www.csie.ntu.edu.tw/~cjlin/libsvm/> has this functionality
> and refers the process to get the probabilities as “Platt scaling”
> <http://www.researchgate.net/profile/John_Platt/publication/2594015_Probabilistic_Outputs_for_Support_Vector_Machines_and_Comparisons_to_Regularized_Likelihood_Methods/links/004635154cff5262d6000000.pdf>.
>
>
> I think this functionality should be in MLLib, but I can't find it?
> Do you think Platt scaling makes sense?
>
>
>  Making clusters using Learning Vector Quantization, determining the
> spread function of a cluster with a Gaussian function and then retrieving
> the probability makes a lot more sense i.m.o. Using the distances from the
> hyperplanes from several SVM classifiers and then trying to determine some
> probability on these distance measures, does not make any sense, because
> the distribution property of the data-points belonging to a cluster is not
> taken into account.
> Does anyone see a fallacy in my reasoning?
>
>
>  With kind regards,
>
> Robert
>

Re: MLLib SVM probability

Reply via email to