Thank you all for the replies. I agree that prediction accuracy is great for evaluating black-box ML models. Especially advanced models like neural networks, or not-so-black models like LASSO, because they are NP-hard to solve.
Linear regression is not a black-box. I view prediction accuracy as an overkill on interpretable models. Especially when you can use R-squared, coefficient significance, etc. Prediction accuracy also does not tell you which feature is important. What do you guys think? Thank you! . On Mon, Jun 3, 2019 at 11:43 AM Andreas Mueller <t3k...@gmail.com> wrote: > This classical paper on statistical practices (Breiman's "two cultures") > might be helpful to understand the different viewpoints: > > https://projecteuclid.org/euclid.ss/1009213726 > > > On 6/3/19 12:19 AM, Brown J.B. via scikit-learn wrote: > > As far as I understand: Holding out a test set is recommended if you >> aren't entirely sure that the assumptions of the model are held (gaussian >> error on a linear fit; independent and identically distributed samples). >> The model evaluation approach in predictive ML, using held-out data, relies >> only on the weaker assumption that the metric you have chosen, when applied >> to the test set you have held out, forms a reasonable measure of >> generalised / real-world performance. (Of course this too is often not held >> in practice, but it is the primary assumption, in my opinion, that ML >> practitioners need to be careful of.) >> > > Dear CW, > As Joel as said, holding out a test set will help you evaluate the > validity of model assumptions, and his last point (reasonable measure of > generalised performance) is absolutely essential for understanding the > capabilities and limitations of ML. > > To add to your checklist of interpreting ML papers properly, be cautious > when interpreting reports of high performance when using 5/10-fold or > Leave-One-Out cross-validation on large datasets, where "large" depends on > the nature of the problem setting. > Results are also highly dependent on the distributions of the underlying > independent variables (e.g., 60000 datapoints all with near-identical > distributions may yield phenomenal performance in cross validation and be > almost non-predictive in truly unknown/prospective situations). > Even at 500 datapoints, if independent variable distributions look similar > (with similar endpoints), then when each model is trained on 80% of that > data, the remaining 20% will certainly be predictable, and repeating that > five times will yield statistics that seem impressive. > > So, again, while problem context completely dictates ML experiment design, > metric selection, and interpretation of outcome, my personal rule of thumb > is to do no-more than 2-fold cross-validation (50% train, 50% predict) when > having 100+ datapoints. > Even more extreme, using try 33% for training and 66% for validation (or > even 20/80). > If your model still reports good statistics, then you can believe that the > patterns in the training data extrapolate well to the ones in the external > validation data. > > Hope this helps, > J.B. > > > > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn