2016-10-02 13:23 GMT+01:00 Thomas Evangelidis <[email protected]>: > > > On 1 October 2016 at 20:48, Алексей Драль <[email protected]> wrote: > >> Hi Thomas, >> >> What quality do you have on training? >> >> There is no silver bullet, but there is quite common technique you can >> use to find out if you use appropriate algorithm. You can take a look at >> the difference between "train" and "validation" quality of learning curves ( >> example >> <http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html#example-model-selection-plot-learning-curve-py>). >> If you see big gap, then you can reduce complexity of your model to >> overcome overfitting (reduce interaction parameter / number of variables >> / iterations / ...). If you see a small gap, then you can try to increase >> model complexity to fit your data better. >> >> >> Hi Алексей, > > the "Training examples" in the learning curves are the number of > observations used for training? Don't you think my dataset is kind of small > (42 observations) to use that technique? >
Yes, it is really a tiny dataset =). You don't necessarily need to use it over number of training observations. For instance, you can have this plot over number of iterations. > > > >> Moreover, I see you have a tiny dataset and use 50/50 split. I presume, >> that you will train "production" model on the whole available dataset. >> In that case, I suggest you to use more data for training and use almost >> LOO >> <http://scikit-learn.org/stable/modules/cross_validation.html#leave-one-out-loo> >> approach >> to better estimate your predictive quality. But, be really cautious about >> cross-validation as you can easily overfit your data. >> >> >> >> > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Yours sincerely, https://www.linkedin.com/in/alexey-dral Alexey A. Dral
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
