[
https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623650#action_12623650
]
Grant Ingersoll commented on MAHOUT-60:
---------------------------------------
Do we always want to use ngrams, though? If n == 1, do we have a way of
filtering out duplicates? Seems like even if n > 1, you could still have
duplicates. Not sure how this is supposed to be handled, will have to look
into the code more.
> Complementary Naive Bayes
> -------------------------
>
> Key: MAHOUT-60
> URL: https://issues.apache.org/jira/browse/MAHOUT-60
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification
> Reporter: Robin Anil
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: country.txt, MAHOUT-60-13082008.patch,
> MAHOUT-60-15082008.patch, MAHOUT-60-17082008.patch, MAHOUT-60.patch,
> MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.