I think alpha = 1/2C

On Tue, Dec 17, 2013 at 7:46 PM, Olivier Grisel <[email protected]>wrote:

> 2013/12/17 Doug Newman <[email protected]>:
> > Hello,
> >
> > I am relatively new to classification problems in machine learning and
> have
> > a somewhat general question regarding the behavior of SGDClassifer with
> > loss='log' as compared to LogisticRegression in the sklearn package.
>
> In theory they are optimizing the same objective function if the
> regularizer type is the same ('l2' by default for both) and the
> regularizer strength is the same (alpha = 1 / C if I am not mistaken,
> alpha being the parameter for SGDClassifier and C the one for
> LogisticRegression).
>
> In practice, the SGD algorithm has a noisy convergence behavior and
> you might need to tweak the learning rate schedule (learning_rate,
> eta0 and n_iter parameters) quite a bit to reach the same training
> (and test) error(s).
>
> The last difference is that LogisticRegression penalizes the intercept
> of the model. It should not matter much in practice, unless the
> optimal `intercept_` is far from zero.
>
> > I have a ton of data points (5 million +) for a binary classification
> > problem, so LogisticRegression works but is pretty slow to iterate on.
> > Looking at the sklearn "start here" map, and doing some investigating on
> my
> > own (some recommendations on this mailing list), I've tried using
> > SGDClassifier to speed things up.  What I noticed when doing this is that
> > SGDClassifier with loss='log' spits out numbers that are much more
> > "confident," meaning if I take its weights and put them through the
> logistic
> > function, the changing the decision threshold from 0.1->0.9 doesn't
> really
> > change the predictions. In other words, it's spitting out large negative
> > values when it thinks '0' and large positive values when it thinks '1',
> with
> > little in-between.  LogisticRegression, on the other hand, seems much
> less
> > confident in its predictions. In other words, changing the decision
> > threshold from 0.1->0.9 (in increments of 1/10) changes the predictions
> > quite drastically. This makes the solution feel "more stable," and in my
> > problem it is quite useful to know how confident my predictor is of a
> given
> > prediction (and as far as I have learned, the LogisticRegression should
> spit
> > out a probability -- exactly what I'd like).
>
> This all probably depends on the value of the regularizer strenght
> (alpha for SGDClassifier and C for LogisticRegression). You should
> always grid search the optimal value for the regularizer parameter (at
> least on a significant subset of the full data data if the full data
> is too large).
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to