Re: [scikit-learn] logistic regression results are not stable between solvers

Benoît Presles Tue, 08 Oct 2019 11:22:00 -0700

As you can notice in the code below, I do scale the data. I do not get any 
convergence warning and moreover I always have n_iter_ < max_iter.



> Le 8 oct. 2019 à 19:51, Andreas Mueller <[email protected]> a écrit :
> 
> I'm pretty sure SAGA is not converging. Unless you scale the data, SAGA is 
> very slow to converge.
> 
>> On 10/8/19 7:19 PM, Benoît Presles wrote:
>> Dear scikit-learn users,
>> 
>> I am using logistic regression to make some predictions. On my own data, I 
>> do not get the same results between solvers. I managed to reproduce this 
>> issue on synthetic data (see the code below).
>> All solvers seem to converge (n_iter_ < max_iter), so why do I get different 
>> results?
>> If results between solvers are not stable, which one to choose?
>> 
>> 
>> Best regards,
>> Ben
>> 
>> ------------------------------------------
>> 
>> Here is the code I used to generate synthetic data:
>> 
>> from sklearn.datasets import make_classification
>> from sklearn.model_selection import StratifiedShuffleSplit
>> from sklearn.preprocessing import StandardScaler
>> from sklearn.linear_model import LogisticRegression
>> #
>> RANDOM_SEED = 2
>> #
>> X_sim, y_sim = make_classification(n_samples=200,
>>                            n_features=45,
>>                            n_informative=10,
>>                            n_redundant=0,
>>                            n_repeated=0,
>>                            n_classes=2,
>>                            n_clusters_per_class=1,
>>                            random_state=RANDOM_SEED,
>>                            shuffle=False)
>> #
>> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, 
>> random_state=RANDOM_SEED)
>> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>>     X_split_train, X_split_test = X_sim[train_index_split], 
>> X_sim[test_index_split]
>>     y_split_train, y_split_test = y_sim[train_index_split], 
>> y_sim[test_index_split]
>>     ss = StandardScaler()
>>     X_split_train = ss.fit_transform(X_split_train)
>>     X_split_test = ss.transform(X_split_test)
>>     #
>>     classifier_lbfgs = LogisticRegression(fit_intercept=True, 
>> max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
>>                                     solver='lbfgs')
>>     classifier_lbfgs.fit(X_split_train, y_split_train)
>>     print('classifier lbfgs iter:',  classifier_lbfgs.n_iter_)
>>     classifier_saga = LogisticRegression(fit_intercept=True, 
>> max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
>>                                     solver='saga')
>>     classifier_saga.fit(X_split_train, y_split_train)
>>     print('classifier saga iter:', classifier_saga.n_iter_)
>>     #
>>     y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
>>     y_pred_saga  = classifier_saga.predict(X_split_test)
>>     #
>>     if (y_pred_lbfgs==y_pred_saga).all() == False:
>>         print('lbfgs does not give the same results as saga :-( !')
>>         exit()
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] logistic regression results are not stable between solvers

Reply via email to