Re: [Scikit-learn-general] SGDClassifier(loss='log') vs. LogisticRegression

Mathieu Blondel Tue, 17 Dec 2013 06:59:36 -0800

On Tue, Dec 17, 2013 at 9:17 AM, Doug Newman <[email protected]> wrote:
>
> So, my question is two-fold: (1) Why this difference? and (2) Would you
> have any recommendations going forward? Is there a better algorithm or
> technique I could read up on that would give me a confidence score on a
> per-prediction basis that would have speed comparable to SGDClassifier?
>


You could try CDClassifier(loss="log") from lightning:
https://github.com/mblondel/lightning

liblinear hardcodes max_iter to 1000 which in practice can be way too much.
With CDClassifier, you can set max_iter to a more reasonable value such as
50. This should speed up the training without too much loss in accuracy.
Also, CDClassifier supports both C and alpha, you can set whichever you
prefer.

BTW, alpha should be equal to 1 / (C x n_samples) in SGDClassifier. The
reason is because with SGD the loss term is divided by n_samples. The 1/2
factor is not needed if both the C-based and alpha-based objectives divide
the  regularization term by 2 (which I think is the case in both liblinear
and our SGD implementation).

Mathieu

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SGDClassifier(loss='log') vs. LogisticRegression

Reply via email to