Dear scikit-learn users and developers, I have a dataset consisting of 42 observation (molnames) and 4 variables ( VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model that estimates the experimental value (Expr). I tried multivariate linear regression using 10,000 bootstrap repeats each time using 21 observations for training and the rest 21 for testing, but the average correlation was only R= 0.1727 +- 0.19779.
molname VDWAALS EEL EGB > ESURF Expr > CHEMBL108457 -20.4848 -96.5826 23.4584 -5.4045 > -7.27193 > CHEMBL388269 -50.3860 28.9403 -51.5147 -6.4061 > -6.8022 > CHEMBL244078 -49.1466 -21.9869 17.7999 -6.4588 > -6.61742 > CHEMBL244077 -53.4365 -32.8943 34.8723 -7.0384 > -6.61742 > CHEMBL396772 -51.4111 -34.4904 36.0326 -6.5443 > -5.82207 > ........ I would like your advice about what other machine learning algorithm I could try with these data. E.g. can I make a decision tree or the observations and variable are too few to avoid overfitting? I could include more variables but the observations will always remain 42. I would greatly appreciate any advice! Thomas
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
