> On Sat, Jan 16, 2016 at 5:40 AM, Kaustubh Patil <[email protected]> wrote: > BTW another way to handle imbbalanced data (and perhaps easier to implement and test) could be assign weights in libvsm. This has to be done for each partition separately, any ideas on how this can be done? > Thanks
just to note that this functionality is currently in PyMVPA broken. https://github.com/PyMVPA/PyMVPA/issues/40 You might be able to plug SVM from scikitlearn if it works there. However, based on my poor understanding on math, statistics and machine learning, class weighting should converge to up-sampling. So up-sample maybe? However num2, they are both suboptimal, because you are indeed moving the decision boundary to the minority class, but the angle of the boundary will be wrong, therefore less naive resampling like SMOTE or ROSE is with combination to moving the cutoff value is advised. Unfortunately I cannot find the paper where was it written. However num3, everything breaks in high dimensional data. And just moving the cutoff(threshold) value of SVM or similar classifier might be enough http://www.ncbi.nlm.nih.gov/pubmed/22408190 > On Fri, Jan 15, 2016 at 11:28 PM, Kaustubh Patil <[email protected] > wrote: > Thanks again Yaroslav. > I agree that the classifier might end up giving 0 or very small balanced accuracy (or micro accuracy) values but I think thats still a better measure than using overall accuracy (or macro accuracy). what do you mean by balanced accuracy? What I know by that name is the mean of sensitivity and specificity, or sensitivity + specificity / 2. You are not suppose to get to 0 with that. >There are couple of other measures that can be useful for imbalanced datasets: > 1. A-mean: arithmetic mean, same as average class-wise accuracy or micro-accuracy > 2. G-mean: geometric mean instead of arithmetic mean above > 3. F-measure > 4. Area under the ROC curve You can also use Cohen's alpha and area under the precision-recall curve. > Of course a better solution would be using a classifier that can handle imbalanced datasets, as you suggested. I have previously used SVMperf that can optimize AU-ROC: https://www.cs.cornell.edu/people/tj/svm_light/svm_perf. <https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html> > On Fri, Jan 15, 2016 at 11:08 PM, Yaroslav Halchenko < [email protected]> wrote: > another solution is to try a classifier which provides weighting > to the classes, e.g. as GNB with default prior setting does. You don't have to weight classes differently, you can just move cutoff(threshold) value towards the minority class. Or just look at AUC ... html <https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html> Choosing a classifier is a different problem than choosing a performance measure. Even if you have a classifier that deals with imbalances nicely, you still need to have a performance measure that makes sense. So not accuracy.
_______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

