Naive Bayes implementation

Loek Cleophas Tue, 26 Jan 2010 03:05:47 -0800

Hi

I was looking at the naive Bayes classifier's implementation, due tomy surprise at the n-gram parameter being used.

My understanding of 'traditional' naive Bayes is that it onlyconsiders probabilities related to single words/tokens, independent ofcontext. Is that not what the Mahout implementation does? Are the N-grams used to also model N-sequences of tokens as "words" to be dealtwith in the algorithm? Or are they used as input in some other way?

It seems it uses "N-grams" of N tokens, not N characters, from what Igather from NGrams.java. Or are they not related to token sequencesbut to character sequences somehow?

Any help or pointers to materials the implementation is based on wouldbe appreciated. (I know that the Complementary Naive Bayesimplementation is quite different and based on a paper introducingthat method - but I'm wondering about the 'normal' Naive Bayesimplementation.)


Regards,
Loek

Naive Bayes implementation

Reply via email to