Nathaniel, On Mon, 7 May 2007, [EMAIL PROTECTED] wrote: > Date: Sun, 6 May 2007 12:02:31 +0000 (GMT) > From: nathaniel Grey <[EMAIL PROTECTED]> > > However what I really want to know is how well my nueral net is > doing at classifying my binary output variable. I am new to R and I > can't figure out how you can assess the success rates of > predictions.
I've been recently tacking this myself, though with respect to polytomous (>2) outcomes. The following approaches are based on Menard (1995), Cohen et al. (2002) and Manning & Schütze (1999). First you have to decide what is the critical probability that you use to classify the cases into class A (and consequently not(class[A])). The simplest level is 0.5, but other levels might also be motivated, see e.g. Cohen et al. (2002: 516-519). You can then treat the classification task as two distinct types, namely classification and prediction models, which have an effect on how the efficiency and accuracy of prediction is exactly measured (Menard 1995: 24-26). In a pure prediction model, we set no a priori expectation or constraint on the overall frequencies of the predicted classes. To the contrary, in a classification model our expectation is that the predicted outcome classes on the long run will end up having the same proportions as are evident in the training data. As the starting point for evaluating prediction efficiency is to compile a 2x2 prediction/classification table. Frequency counts on the (decending) diagonal in the table indicate correctly predicted and classified cases, whereas all counts off the diagonal are incorrect. For the two alternatives overerall, we can divide the predicted classifications into the four types presented below, on which the basic measures of prediction efficiency are based. (Manning and Schütze 1999: 267-271) Original/Predicted Class not(Class)(=Other) Class TP ~ True Positive) FN ~ False Negative not(Class) (=Other) FP ~ False Positive TN ~ True Negative You can then go on to calculate recall and precision, or spesificity or sensitivity. Recall is the proportion of original occurrences of some particular class for which the prediction is correct (formula 1 below, see Manning and Schütze 1999: 269, formula 8.4), whereas precision is the proportion of the all the predictions of some particular class, which turn out to be correct (formula 2 below, see Manning and Schütze 1999: 268, formula 8.3). Sensitivity is in fact exactly equal to recall, whereas specificity is understood as the proportion of non-cases correctly predicted or classified as non-cases, i.e. rejected (formula 3 below) Furthermore, there is a third pair of evaluation measures that one could also calculate, namely accuracy and error (formula 4 below) (Manning and Schütze 1999: 268-270). (1) Recall = TP / (TP + FN) (=Sensitivity) (2) Precision = TP / (TP + FP) (3) Specificity = TN / (TN + FN) (4) Accuracy = (TP + TN) / N = diag(n[k,k]) However, as has been noted in some earlier responses these aforementioned general measures do not in any way take into consideration whether prediction and classification according to a model, with the help of explanatory variables, performs any better than knowing the overall proportions of the outcome classes. For this purpose, the asymmetric summary measures of association based on Proportionate Reduction of Error (PRE) are good candidates for evaluating prediction accuracy, where we expect that the prediction or classification process on the basis of the models should exceed some baselines or thresholds. However, one cannot use the Goodman-Kruskal lambda and tau as such, but make some adjustments to account for the possibility of incorrect prediction. With this approach one compares prediction/classification errors with the model, error(model), to the baseline level of prediction/classification errors without the error(model, baseline), according to formula 10 below. (Menard 1995: 28-30). The formula for the error with the model remains the same, irrespective of whether we are evaluating prediction or classification accuracy, presented in (5), but the errors without the model vary according to the intended objective, presented in (6) and (7). Subsequently, the measure for the proportionate reduction of prediction error is presented in (9) below, and being analogous to the Goodman-Kruskal lambda it is designated as lambda(prediction). Similarly, the measure for proportionate reduction of classification error is presented in (10), and being analogous with the Goodman-Kruskal tau it is likewise designated as tau(classification). For both measures, positive values indicate better than baseline classification, while negative values worse performance. (5) error(model) = N - SUM{k=1...K}n[k,k] = N - SUM{diag(n)], where n is the 2x2 prediction/classification matrix (6) error(baseline, prediction) = N - max(R[k]), with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (7) error(baseline, classification) = SUM{k=1...K}(R[k]·((N-R[k])/N) with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (8) PRE = error(baseline)-error(model))/error(baseline,pred.|class.) (9) lambda(prediction) = 1-error(model) / error(baseline,prediction) (10) tau(classification) = 1-error(model)/ error(baseline,classification) REFERENCES: Cohen, Jacob, Cohen Patricia, West, Stephen G. and Leona S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edition). Lawrence Erlbaum Associates, Mahwah, New Jersey. Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-106. Sage Publications, Thousand Oaks, California. Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of statistical natural language processing." Cambridge, Massachusetts: MIT Press. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.