[ 
https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623650#action_12623650
 ] 

Grant Ingersoll commented on MAHOUT-60:
---------------------------------------

Do we always want to use ngrams, though?  If n == 1, do we have a way of 
filtering out duplicates?  Seems like even if n > 1, you could still have 
duplicates.  Not sure how this is supposed to be handled, will have to look 
into the code more.

> Complementary Naive Bayes
> -------------------------
>
>                 Key: MAHOUT-60
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-60
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification
>            Reporter: Robin Anil
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: country.txt, MAHOUT-60-13082008.patch, 
> MAHOUT-60-15082008.patch, MAHOUT-60-17082008.patch, MAHOUT-60.patch, 
> MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper 
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to