Robin,

Any luck with this?

On Oct 11, 2011, at 7:22 AM, Robin Anil wrote:

> I am guessing this is on the new naivebayes package. I would like to check
> the data and compare against the old implementation if its a bug.
> 
> On Tue, Oct 11, 2011 at 4:18 PM, Grant Ingersoll <gsing...@apache.org>wrote:
> 
>> 
>> On Oct 11, 2011, at 1:47 AM, Robin Anil wrote:
>> 
>>> Could be due to the way normalization is done.
>> 
>> In what part of the process?
>> 
>>> How is CNB performing?
>> 
>> It's better, like 40% correct, 60% wrong, but still not good.
>> 
>>> Do
>>> share the confusion matrices and per label precision.
>> 
>> Usually on the order of 0.05 correct, 95% wrong.
>> 
>> If I bring the --maxItemsPerLabel (PrepEmailVectorsDriver) down to about
>> 1000, then I get better results, but still not better than guessing.  The
>> main issue is that many of the mail archives have a ton of entries, but then
>> a few only have less than 1000.    On the flip side, 1000 is not really
>> enough training wise.  If I restrict down the input to mailing lists that
>> have at least 10K items, then I get much better results.  Of course, this is
>> expected.  The main issue is I don't understand why it would be picking the
>> labels with the least amount of data.
>> 
>>> 
>>> On Mon, Oct 10, 2011 at 11:20 PM, Grant Ingersoll <gsing...@apache.org
>>> wrote:
>>> 
>>>> I was trying the Naive Bayes classifier via the build-asf-email.sh file
>> I
>>>> committed the other day on a data set that had a fairly significant
>>>> variation in the number of messages per training label and am noticing
>>>> (still need to validate more) that the label with the least number of
>>>> examples is often dominating the results.  This seems counterintuitive
>> to
>>>> me.  I would have expected the largest set would have dominated the
>> results.
>>>> If I even out the number of items per label, than I get reasonable
>> results.
>>>> Any thoughts on what I am seeing?  If you are interested, I can share
>> the
>>>> details of the runs.
>>>> 
>>>> -Grant
>>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>> 
>> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to