Hey guys! I am currently trying to use the best possible classifier for my task.
In my case I have regularly slightly more features than training examples and overall about 5000 features. The problem is that my representation is very sparse so I have a huge amount of zeros. The labels range from 1 to 25. Furthermore the dataset is skewed so one class takes a huge amount of labels and another one is also pretty high. I have successfully used logistic regression and I could achieve a recall of about (in the best case dataset) 65%. I am pretty happy with that result. But when looking at the confusion matrix the problem is that many examples get mapped to the large class. Anybody got an idea how to improve the classification? Maybe do some kind of feature selection or something else? Thanks in advance! Philipp ------------------------------------------------------------------------------ RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
