ok let me rephrase my question once again. python-wise I am preferring .predict_proba(X) instead of .decision_function(X) since it is easier for me to interpret the results. as far as I can see, the latter functionality is already implemented in Spark (well, in version 0.9.2 for example I have to compute the dot product on my own otherwise I get 0 or 1) but the former is not implemented (yet!). what should I do \ how to implement that one in Spark as well? what are the required inputs here and how does the formula look like?
On Tue, Oct 7, 2014 at 10:04 PM, Sean Owen <so...@cloudera.com> wrote: > It looks like you are directly computing the SVM decision function in > both cases: > > val predictions2 = m_users_double.map{point=> > point.zip(weights).map(a=> a._1 * a._2).sum + intercept > }.cache() > > clf.decision_function(T) > > This does not give you +1/-1 in SVMs (well... not for most points, > which will be outside the margin around the separating hyperplane). > > You can use the predict() function in SVMModel -- which will give you > 0 or 1 (rather than +/- 1 but that's just differing convention) > depending on the sign of the decision function. I don't know if this > was in 0.9. > > At the moment I assume you saw small values of the decision function > in scikit because of the radial basis function. > > On Tue, Oct 7, 2014 at 7:45 PM, Sunny Khatri <sunny.k...@gmail.com> wrote: > > Not familiar with scikit SVM implementation ( and I assume you are using > > linearSVC). To figure out an optimal decision boundary based on the > scores > > obtained, you can use an ROC curve varying your thresholds. > > >