[jira] Commented: (MAHOUT-60) Complementary Naive Bayes

Grant Ingersoll (JIRA) Mon, 18 Aug 2008 09:34:35 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623397#action_12623397
 ]


Grant Ingersoll commented on MAHOUT-60:
---------------------------------------

I'm getting failures in the BayesFileFormatterTest.  Namely due to the change 
to \t, which is an easy fix.  However, I wonder why the check to the "seen" 
CharSet was removed?  I seem to recall that we only want unique words for 
training, otherwise the calculations get screwed up, at least in the NB 
implementation (not sure what you want in CNB)

The loop used to look like:
{code}
while ((token = ts.next(token)) != null) {
      char[] termBuffer = token.termBuffer();
      int termLen = token.termLength();
      if (seen.contains(termBuffer, 0, termLen) == false) {
        if (numTokens > 0) {
          writer.write(' ');
        }
        writer.write(termBuffer, 0, termLen);
        char [] tmp = new char[termLen];
        System.arraycopy(termBuffer, 0, tmp, 0, termLen);
        seen.add(tmp);//do this b/c CharArraySet doesn't allow offsets
      }
{code}

> Complementary Naive Bayes
> -------------------------
>
>                 Key: MAHOUT-60
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-60
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification
>            Reporter: Robin Anil
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: country.txt, MAHOUT-60-13082008.patch, 
> MAHOUT-60-15082008.patch, MAHOUT-60-17082008.patch, MAHOUT-60.patch, 
> MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper 
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-60) Complementary Naive Bayes

Reply via email to