Hi All,

I am trying to understand Python’s code [function ‘_fit_liblinear' in 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py] 
for fitting an L2-logistic regression for a ‘liblinear’ solver. More 
specifically, my [approximately balanced class] dataset is such that the # of 
predictors [p=2000] >> # of observations [n=100]. Therefore, I am currently 
confused that when I increase C [and thus decrease the regularization strength] 
in fitting the logistic regression model to my training data why I then still 
obtain high AUC results when the model is then applied to my testing data. Is 
python internally doing a feature selection when fitting this model for high C 
values? Or why is it that the almost unregularized model [high C values] versus 
regularized [cross-validated approach to selecting C] model both result in 
similar AUC and accuracy results when the model is applied to the testing data? 
Should I be coding my predictors as +1/-1? 

Any pointers/explanations would be much appreciated!

Thanks,
Kristen
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to