Am 15.01.2012 19:45, schrieb Gael Varoquaux:
> On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote:
>> The problem is that my representation is very sparse so I have a huge
>> amount of zeros.
> That's actually good: some of our estimators are able to use a sparse
> representation to speed up computation.
>
>> Furthermore the dataset is skewed so one class takes a huge amount of
>> labels and another one is also pretty high.
>> I have successfully used logistic regression and I could achieve a
>> recall of about (in the best case dataset) 65%. I am pretty happy with
>> that result. But when looking at the confusion matrix the problem is
>> that many examples get mapped to the large class.
> Use "class_weight='auto'" in the logistic regression to counter the
> effect of un-balanced classes.
>
> For SVMs, the following example shows the trick:
> http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html
>
> HTH,
>
> Gael
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Mar 27 - Feb 2
> Save $400 by Jan. 27
> Register now!
> http://p.sf.net/sfu/rsa-sfdev2dev2
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Thanks a lot for the help! This helped out quite a bit. But I am still 
not entirely happy with the results. Maybe some further ideas?

Thanks a lot
Philipp

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to