Dear scikit-learn users and developers,

I have a dataset consisting of 42 observation (molnames) and 4 variables (
VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model that
estimates the experimental value (Expr). I tried multivariate linear
regression using 10,000 bootstrap repeats each time using 21 observations
for training and the rest 21 for testing, but the average correlation was
only R= 0.1727 +- 0.19779.


molname                    VDWAALS     EEL               EGB
>  ESURF        Expr
> CHEMBL108457        -20.4848        -96.5826         23.4584       -5.4045
>        -7.27193
> CHEMBL388269        -50.3860         28.9403        -51.5147       -6.4061
>        -6.8022
> CHEMBL244078        -49.1466        -21.9869         17.7999       -6.4588
>        -6.61742
> CHEMBL244077        -53.4365        -32.8943         34.8723       -7.0384
>        -6.61742
> CHEMBL396772        -51.4111        -34.4904         36.0326       -6.5443
>        -5.82207
> ........


I would like your advice about what other machine learning algorithm I
could try with these data. E.g. can I make a decision tree or the
observations  and variable are too few to avoid overfitting? I could
include more variables but the observations will always remain 42.

I would greatly appreciate any advice!

Thomas
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to