> From: Martin C. Martin > > Hi, > > I have a bunch of data points x from two classes A & B, and > I'm creating > a classifier. So I have a function f(x) which estimates the > probability > that x is in class A. (I have an equal number of examples of > each, so > p(class) = 0.5.) > > One way of seeing how well this does is to compute the error > rate on the > test set, i.e. if f(x)>0.5 call it A, and see how many times I > misclassify an item. That's what MASS does. But we should
Surely you mean `99% of dataminers/machine learners' rather than `MASS'? > be able to > do better: misclassifying should be more of a problem if the > regression > is confident then if it isn't. > > How can I show that my f(x) = P(x is in class A) does better > than chance? It depends on what you mean by `better'. For some problem, people are perfectly happy with misclassifcation rate. For others, the estimated probabilities count a lot more. One possibility is to look at the ROC curve. Another possibility is to look at the calibration curve (see MASS the book). Andy > Thanks, > Martin > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html