Hello,
I am relatively new to classification problems in machine learning and have a 
somewhat general question regarding the behavior of SGDClassifer with 
loss='log' as compared to LogisticRegression in the sklearn package.
I have a ton of data points (5 million +) for a binary classification problem, 
so LogisticRegression works but is pretty slow to iterate on. Looking at the 
sklearn "start here" map, and doing some investigating on my own (some 
recommendations on this mailing list), I've tried using SGDClassifier to speed 
things up.  What I noticed when doing this is that SGDClassifier with 
loss='log' spits out numbers that are much more "confident," meaning if I take 
its weights and put them through the logistic function, the changing the 
decision threshold from 0.1->0.9 doesn't really change the predictions. In 
other words, it's spitting out large negative values when it thinks '0' and 
large positive values when it thinks '1', with little in-between.  
LogisticRegression, on the other hand, seems much less confident in its 
predictions. In other words, changing the decision threshold from 0.1->0.9 (in 
increments of 1/10) changes the predictions quite drastically. This makes the 
solution feel "more stable," and in my problem it is quite useful to know how 
confident my predictor is of a given prediction (and as far as I have learned, 
the LogisticRegression should spit out a probability -- exactly what I'd like). 
So, my question is two-fold: (1) Why this difference? and (2) Would you have 
any recommendations going forward? Is there a better algorithm or technique I 
could read up on that would give me a confidence score on a per-prediction 
basis that would have speed comparable to SGDClassifier?
Thanks very much,Doug
                                          
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to