That should totally depend on your dataset. Maybe it is an "easy" dataset and not much regularization is needed.
Maybe use PCA(n_components=2) or an LDA transform to take a look at your data in 2D. Maybe they are easily linearly separable? Sklearn does not do any feature selection if you don't ask it to. What C-values are you using? Try an np.logspace but go much farther out both sides than you think reasonable. Then plot AUC as a function of that to get a global idea of what is going on. hth, Michael On Friday, September 30, 2016, Kristen M. Altenburger <kalt...@stanford.edu> wrote: > Hi All, > > I am trying to understand Python’s code [function ‘_fit_liblinear' in > https://github.com/scikit-learn/scikit-learn/blob/ > master/sklearn/svm/base.py] for fitting an L2-logistic regression for a > ‘liblinear’ solver. More specifically, my [approximately balanced class] > dataset is such that the # of predictors [p=2000] >> # of observations > [n=100]. Therefore, I am currently confused that when I increase C [and > thus decrease the regularization strength] in fitting the logistic > regression model to my training data why I then still obtain high AUC > results when the model is then applied to my testing data. Is python > internally doing a feature selection when fitting this model for high C > values? Or why is it that the almost unregularized model [high C values] > versus regularized [cross-validated approach to selecting C] model both > result in similar AUC and accuracy results when the model is applied to the > testing data? Should I be coding my predictors as +1/-1? > > Any pointers/explanations would be much appreciated! > > Thanks, > Kristen > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <javascript:;> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn