Re: [scikit-learn] logistic regression results are not stable between solvers

Guillaume Lemaître Wed, 09 Oct 2019 11:27:19 -0700

Could you generate more samples, set penalty to none, reduce the tolerance and 
check the coefficients instead of predictions. This is sure to be sure that 
this is not only a numerical error.





Sent from my phone - sorry to be brief and potential misspell.



          Original Message  



From: [email protected]
Sent: 8 October 2019 20:27
To: [email protected]
Reply to: [email protected]
Subject: [scikit-learn] logistic regression results are not stable between 
solvers


Dear scikit-learn users,

I am using logistic regression to make some predictions. On my own data,
I do not get the same results between solvers. I managed to reproduce
this issue on synthetic data (see the code below).
All solvers seem to converge (n_iter_ < max_iter), so why do I get
different results?
If results between solvers are not stable, which one to choose?


Best regards,
Ben

------------------------------------------

Here is the code I used to generate synthetic data:

from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
                           n_features=45,
                           n_informative=10,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           n_clusters_per_class=1,
                           random_state=RANDOM_SEED,
                           shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
    X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
    y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
    ss = StandardScaler()
    X_split_train = ss.fit_transform(X_split_train)
    X_split_test = ss.transform(X_split_test)
    #
    classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
                                    solver='lbfgs')
    classifier_lbfgs.fit(X_split_train, y_split_train)
    print('classifier lbfgs iter:',  classifier_lbfgs.n_iter_)
    classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
                                    solver='saga')
    classifier_saga.fit(X_split_train, y_split_train)
    print('classifier saga iter:', classifier_saga.n_iter_)
    #
    y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
    y_pred_saga  = classifier_saga.predict(X_split_test)
    #
    if (y_pred_lbfgs==y_pred_saga).all() == False:
        print('lbfgs does not give the same results as saga :-( !')
        exit()

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] logistic regression results are not stable between solvers

Reply via email to