Thank you Sean. I'll try to do it externally as you suggested, however, can you please give me some hints on how to do that? In fact, where can I find the 1.2 implementation you just mentioned? Thanks!
On Wed, Oct 8, 2014 at 12:58 PM, Sean Owen <so...@cloudera.com> wrote: > Plain old SVMs don't produce an estimate of class probabilities; > predict_proba() does some additional work to estimate class > probabilities from the SVM output. Spark does not implement this right > now. > > Spark implements the equivalent of decision_function (the wTx + b bit) > but does not expose it, and instead gives you predict(), which gives 0 > or 1 depending on whether the decision function exceeds the specified > threshold. > > Yes you can roll your own just like you did to calculate the decision > function from weights and intercept. I suppose it would be nice to > expose it (do I hear a PR?) but it's not hard to do externally. You'll > have to do this anyway if you're on anything earlier than 1.2. > > On Wed, Oct 8, 2014 at 10:17 AM, Adamantios Corais > <adamantios.cor...@gmail.com> wrote: > > ok let me rephrase my question once again. python-wise I am preferring > > .predict_proba(X) instead of .decision_function(X) since it is easier > for me > > to interpret the results. as far as I can see, the latter functionality > is > > already implemented in Spark (well, in version 0.9.2 for example I have > to > > compute the dot product on my own otherwise I get 0 or 1) but the former > is > > not implemented (yet!). what should I do \ how to implement that one in > > Spark as well? what are the required inputs here and how does the formula > > look like? > > > > On Tue, Oct 7, 2014 at 10:04 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> It looks like you are directly computing the SVM decision function in > >> both cases: > >> > >> val predictions2 = m_users_double.map{point=> > >> point.zip(weights).map(a=> a._1 * a._2).sum + intercept > >> }.cache() > >> > >> clf.decision_function(T) > >> > >> This does not give you +1/-1 in SVMs (well... not for most points, > >> which will be outside the margin around the separating hyperplane). > >> > >> You can use the predict() function in SVMModel -- which will give you > >> 0 or 1 (rather than +/- 1 but that's just differing convention) > >> depending on the sign of the decision function. I don't know if this > >> was in 0.9. > >> > >> At the moment I assume you saw small values of the decision function > >> in scikit because of the radial basis function. >