Could be due to the way normalization is done. How is CNB performing? Do share the confusion matrices and per label precision.
On Mon, Oct 10, 2011 at 11:20 PM, Grant Ingersoll <[email protected]>wrote: > I was trying the Naive Bayes classifier via the build-asf-email.sh file I > committed the other day on a data set that had a fairly significant > variation in the number of messages per training label and am noticing > (still need to validate more) that the label with the least number of > examples is often dominating the results. This seems counterintuitive to > me. I would have expected the largest set would have dominated the results. > If I even out the number of items per label, than I get reasonable results. > Any thoughts on what I am seeing? If you are interested, I can share the > details of the runs. > > -Grant >
