2013/12/17 Doug Newman <[email protected]>:
> Hello,
>
> I am relatively new to classification problems in machine learning and have
> a somewhat general question regarding the behavior of SGDClassifer with
> loss='log' as compared to LogisticRegression in the sklearn package.
In theory they are optimizing the same objective function if the
regularizer type is the same ('l2' by default for both) and the
regularizer strength is the same (alpha = 1 / C if I am not mistaken,
alpha being the parameter for SGDClassifier and C the one for
LogisticRegression).
In practice, the SGD algorithm has a noisy convergence behavior and
you might need to tweak the learning rate schedule (learning_rate,
eta0 and n_iter parameters) quite a bit to reach the same training
(and test) error(s).
The last difference is that LogisticRegression penalizes the intercept
of the model. It should not matter much in practice, unless the
optimal `intercept_` is far from zero.
> I have a ton of data points (5 million +) for a binary classification
> problem, so LogisticRegression works but is pretty slow to iterate on.
> Looking at the sklearn "start here" map, and doing some investigating on my
> own (some recommendations on this mailing list), I've tried using
> SGDClassifier to speed things up. What I noticed when doing this is that
> SGDClassifier with loss='log' spits out numbers that are much more
> "confident," meaning if I take its weights and put them through the logistic
> function, the changing the decision threshold from 0.1->0.9 doesn't really
> change the predictions. In other words, it's spitting out large negative
> values when it thinks '0' and large positive values when it thinks '1', with
> little in-between. LogisticRegression, on the other hand, seems much less
> confident in its predictions. In other words, changing the decision
> threshold from 0.1->0.9 (in increments of 1/10) changes the predictions
> quite drastically. This makes the solution feel "more stable," and in my
> problem it is quite useful to know how confident my predictor is of a given
> prediction (and as far as I have learned, the LogisticRegression should spit
> out a probability -- exactly what I'd like).
This all probably depends on the value of the regularizer strenght
(alpha for SGDClassifier and C for LogisticRegression). You should
always grid search the optimal value for the regularizer parameter (at
least on a significant subset of the full data data if the full data
is too large).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general