[Scikit-learn-general] Text classifier with varying training data size for each labelled set

Abhiram Koneru Thu, 07 Mar 2013 20:11:50 -0800

Hello,

I am trying to train a text classifier that can classify words and attach
labels to each of them. I have a MULTI-CLASS classifier (Linear SVM). The
classifier works well on a small training data. The problem arrives when I
use my actual training data to run the classifier. one of my labelled set
(lets say A) has a huge number of samples in it (~500) and the the other
labelled sets (say B, C, D, E ~5 to 30) have very less samples when
compared. Now this is wher it gets weird. Even if i enter an exact match
from set A, it is labelled with B/C/D/E. I have tried changing the weights
to 'auto' but no effect. Should I be looking at other algorithms? I have no
clue on how to proceed.


Thank you for your suggestions!


Thanks and Regards
-- 
Abhiram Koneru
Graduate Research Assistant
Clemson University
136 Fluor Daniel Building
Clemson, SC 29631
Email: [email protected] Ph no: (864)643-9672

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Text classifier with varying training data size for each labelled set

Reply via email to