Hey guys!

I am currently trying to use the best possible classifier for my task.

In my case I have regularly slightly more features than training 
examples and overall about 5000 features. The problem is that my 
representation is very sparse so I have a huge amount of zeros. The 
labels range from 1 to 25.

Furthermore the dataset is skewed so one class takes a huge amount of 
labels and another one is also pretty high.

I have successfully used logistic regression and I could achieve a 
recall of about (in the best case dataset) 65%. I am pretty happy with 
that result. But when looking at the confusion matrix the problem is 
that many examples get mapped to the large class.

Anybody got an idea how to improve the classification? Maybe do some 
kind of feature selection or something else?

Thanks in advance!

Philipp

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to