Thank you all for the responses. I was able to access the link provided by Robin. I will have to go through the document a little slowly to understand how the probabilities help/not help. Will do that soon.
As for my pet project, I was trying to implement an EM algorithm using Naive Bayes. Hence, I think probability of classes would not be equal( or cancel out), since I need to deal with a large amount of unlabelled data. While assigning the probabilities to these documents, the Pr(C) would also change. May be the document addresses this as well. Will keep the group posted of what I learn. Thanks Guru On Thu, Apr 29, 2010 at 12:15 PM, Ted Dunning <[email protected]> wrote: > On Wed, Apr 28, 2010 at 11:25 PM, Gurudev Devanla <[email protected] > >wrote: > > > > > This is my first post ever on any open source mailing list. So, please > > excuse me if I am not following certain standards. > > > > You are doing great. > > > > I was walking through the code for Naive Bayes classifier and I notice > that > > in TestClassifier.java, at the point where the document wieghts are > > calculated the probability of the class(label) is not taken into > > consideration. My knowledge of document wt in Naive Bayes is : > > > > Pr(C|D ) = Pr(D|C) * P(C) , but in the implementation I have > downloaded, > > I > > don't see Pr(C) being used in the calculation. > > > > Actually, the real computation is > > pr(C and D) = pr(D | C) * pr(C) > pr(C | D) = pr(C and D) / pr(D) = pr(D | C) * pr(C) / pr(D) > > With D fixed to a single document under consideration, we don't need to > consider pr(D) because > > argmax_C pr(C | D) = argmax_C pr(C, D) > > You are correct, however, that pr(C) might well be considered. It is > conventional assumed, however, that the probabilities of all classes are > equal so that this term can be ignored. If you have information about the > a > priori prevalence of different categories, it would not be amiss to include > this factor. > > This consideration is considered in equation (3) in the paper "Tackling the > Poor Assumptions of Naive Bayes Text Classifiers" by Jason Rennie and others > that Robin mentions where log pr(C) is written as b_c. Just after this, > however, the authors say: > > "the class probabilities tend to be overpowered by the combination of word > probabilities, so we use a uniform prior estimate for simplicity" > > > This is equivalent to saying that pr(C) = 1 / m where m is the number of > categories. > > If you have trouble getting the PDF that Robin mentioned (CiteseerX is like > a yo-yo lately) you can get the slides for a talk by Jason on the same > topic: http://people.csail.mit.edu/jrennie/talks/icml03.pdf >
