Re: [jira] Commented: (MAHOUT-60) Complementary Naive Bayes

Robin Anil Mon, 18 Aug 2008 12:41:39 -0700

Also in a weight Normalized Bayes/CBayes Implementation. The frequency of a
word in a document is divided by document length. So this uniqueness get
taken care of in the Feature Mapper/Reducer Stage. So if a word occurs more
in a documents of  a certain class. It is assumed to be a good feature for
that class. But if the same word occurs with same frequency in two documents
of different classes, then the amount which they contribute towards class
discrimination is based on the relative size of the document, so in that
case a smaller document with same frequency of that word  will ensure that
word is a better feature for that class which the smaller document belongs
to


I hope i am making sense in that long sentence

On Mon, Aug 18, 2008 at 11:13 PM, Robin Anil (JIRA) <[EMAIL PROTECTED]> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623415#action_12623415]
>
> Robin Anil commented on MAHOUT-60:
> ----------------------------------
>
> I am generating the bigrams. So if you keep only unique words then bigrams
> dont get generated correctly.
>
>
>
>
> > Complementary Naive Bayes
> > -------------------------
> >
> >                 Key: MAHOUT-60
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-60
> >             Project: Mahout
> >          Issue Type: Sub-task
> >          Components: Classification
> >            Reporter: Robin Anil
> >            Assignee: Grant Ingersoll
> >            Priority: Minor
> >             Fix For: 0.1
> >
> >         Attachments: country.txt, MAHOUT-60-13082008.patch,
> MAHOUT-60-15082008.patch, MAHOUT-60-17082008.patch, MAHOUT-60.patch,
> MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch,
> twcnb.jpg
> >
> >
> > The focus is to implement an improved text classifier based on this paper
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (MAHOUT-60) Complementary Naive Bayes

Reply via email to