Hello,
I am relatively new to classification problems in machine learning and have a
somewhat general question regarding the behavior of SGDClassifer with
loss='log' as compared to LogisticRegression in the sklearn package.
I have a ton of data points (5 million +) for a binary classification problem,
so LogisticRegression works but is pretty slow to iterate on. Looking at the
sklearn "start here" map, and doing some investigating on my own (some
recommendations on this mailing list), I've tried using SGDClassifier to speed
things up. What I noticed when doing this is that SGDClassifier with
loss='log' spits out numbers that are much more "confident," meaning if I take
its weights and put them through the logistic function, the changing the
decision threshold from 0.1->0.9 doesn't really change the predictions. In
other words, it's spitting out large negative values when it thinks '0' and
large positive values when it thinks '1', with little in-between.
LogisticRegression, on the other hand, seems much less confident in its
predictions. In other words, changing the decision threshold from 0.1->0.9 (in
increments of 1/10) changes the predictions quite drastically. This makes the
solution feel "more stable," and in my problem it is quite useful to know how
confident my predictor is of a given prediction (and as far as I have learned,
the LogisticRegression should spit out a probability -- exactly what I'd like).
So, my question is two-fold: (1) Why this difference? and (2) Would you have
any recommendations going forward? Is there a better algorithm or technique I
could read up on that would give me a confidence score on a per-prediction
basis that would have speed comparable to SGDClassifier?
Thanks very much,Doug
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general