Hi all,
Today I updated to the latest release of scikit-learn, and I went to test out
the LassoCV module in linear_model. I've tried both approaches below, and my
accuracy seems very poor, while using the same exact data with glmnet in R for
example will give me ~ 75% accuracy:
from sklearn import linear_model
from sklearn.model_selection import StratifiedKFold, train_test_split
lassocv1 = linear_model.LassoCV(cv=10, max_iter=10000, n_alphas=10000)
xtrain, xtest, ytrain, ytest = train_test_split(
endo_Xv, endo_y, test_size = .25, random_state = 1
)
lassocv1.fit(xtrain, ytrain)
lassocv1.score(xtest, ytest)
from this, lassocv1.coef_ returns all zero coefficients
I've also tried this:
k_fold_S = StratifiedKFold(n_splits=10, shuffle=False)
lasso_cv = linear_model.LassoCV()
alphas=[]
scores=[]
coefs=[]
ks=[]
for k, (train, test) in enumerate(k_fold_S.split(endo_Xv, endo_y)):
lasso_cv.fit(endo_Xv[train], endo_y[train])
scores.append(lasso_cv.score(endo_Xv[test], endo_y[test]))
alphas.append(lasso_cv.alpha_)
coefs.append(lasso_cv.coef_)
ks.append(k)
for all k, the coef_ arrays are all zero and the scores array for example:
[-1.3295256159340241e-05,
-1.3295256159562285e-05,
-1.3295256159784328e-05,
-1.3295256159562285e-05,
-1.3295256159562285e-05,
-1.3295256159340241e-05,
-6.4162287406910323e-05,
-6.4162287406910323e-05,
-6.4162287406910323e-05,
-3.8436343168246623e-06])
Any insights would be greatly appreciated, not sure if this has anything to do
with the update, but yesterday(unupdated) I was getting better performance.
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn