Dear CW,
> Linear regression is not a black-box. I view prediction accuracy as an > overkill on interpretable models. Especially when you can use R-squared, > coefficient significance, etc. > Following on my previous note about being cautious with cross-validated evaluation for classification, the same applies for regression. About 20 years ago, chemoinformatics researchers pointed out the caution needed with using CV-based R^2 (q^2) as a measure of performance. "Beware of q2!" Golbraikh and Tropsha, J Mol Graph Modeling (2002) 20:269 https://www.sciencedirect.com/science/article/pii/S1093326301001231 In this article, they propose to measure correlation by using both known-VS-predicted _and_ predicted-VS-known calculations of the correlation coefficient, and importantly, that the regression line to fit in both cases goes through the origin. The resulting coefficients are checked as a pair, and the authors argue that only if they are both high can one say that the model is fitting the data well. Contrast this to Pearson Product Moment Correlation (R), where the fit of the line has no requirement to go through the origin of the fit. I found the paper above to be helpful in filtering for more robust regression models, and have implemented my own version of their method, which I use as my first evaluation metric when performing regression modelling. Hope this provides you some thought. Prediction accuracy also does not tell you which feature is important. > The contributions of the scikit-learn community have yielded a great set of tools for performing feature weighting separate from model performance evaluation. All you need to do is read the documentation and try out some of the examples, and you should be ready to adapt to your situation. J.B.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn