[Scikit-learn-general] Prediction probabilities with sparse SVM

Brett Meyer Mon, 09 Jun 2014 12:38:00 -0700

I¹m having an issue using the prediction probabilities for sparse SVM, where
many of the predictions come out the same for my test instances.  These
probabilities are produced during cross validation, and when I plot an ROC
curve for the folds, the results look very strange, as there are a handful
of clustered points on the graph.  Here is my cross validation code, I based
it off of the samples on the scikit website:


skf = StratifiedKFold(y, n_folds=numfolds)

for train_index, test_index in skf:
            #split the training and testing sets
            X_train, X_test = X_scaled[train_index], X_scaled[test_index]
            y_train, y_test = y[train_index], y[test_index]

            #train on the subset for this fold
            print 'Training on fold ' + str(fold)
            classifier = svm.SVC(C=C_val, kernel='rbf', gamma=gamma_val,
probability=True)
            probas_ = classifier.fit(X_train, y_train).predict_proba(X_test)

            #Compute ROC curve and area the curve
            fpr, tpr, thresholds = roc_curve(y_test, probas_[:, 1])
            mean_tpr += interp(mean_fpr, fpr, tpr)
            mean_tpr[0] = 0.0
            roc_auc = auc(fpr, tpr)

I¹m just trying to figure out if there¹s something I¹m obviously missing
here, since I used this same training set and SVM parameters with libsvm and
got much better results.  When I used libsvm and printed out the distances
from the hyperplane for the CV test instances and then plotted the ROC, it
came out much more like I expected, and a much better AUC.  Any pointers
would be greatly appreciated!

Brett Meyer

smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Prediction probabilities with sparse SVM

Reply via email to