Those are inconclusive as its testing only a few strings. Slight change in
weights can throw off things. I put it as a sanity check to ensure noone
changes the code drastically without others knowing about it.

 Can you run the 20newsgroups example and print the confusion matrix. If
that didnt change, I dont think we need to look too much into the test. We
can stuff more words and change the weight.


On Thu, Oct 6, 2011 at 1:50 AM, Isabel Drost (Updated) (JIRA) <
[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Isabel Drost updated MAHOUT-826:
> --------------------------------
>
>    Attachment: mahout-826.patch
>
> I've taken a quick look at the patch - technically looks good. Added a test
> case for each modification and reformatted slightly.
>
> Concerning the test failure: I see the same. The change seems to cause a
> different output when testing the "BayesAlgorithm" - output remains the same
> for the "CBayesAlgorithm".
>
> Robin, do you have any quick explanation for that? Otherwise we might have
> to dig a bit deeper.
>
> > Bayes/CBayes classification on a non-existing feature
> > -----------------------------------------------------
> >
> >                 Key: MAHOUT-826
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-826
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Classification
> >            Reporter: Andre-Philippe Paquet
> >            Priority: Minor
> >         Attachments: mahout-826.patch, mahout-826.patch
> >
> >
> > (see http://comments.gmane.org/gmane.comp.apache.mahout.user/9597)
> > Using CBayes or Bayes, when trying to classify a feature/word that
> doesn't exist in the model, instead of returning the default/unknown label,
> the algorithm returns all labels with a constant score (ex:
> 12.386649147018964). After a quick look in CBayesAlgorithm, I found the
> problem in the featureWeight function that returns the theta normalized
> weight even if the feature didn't have any match (result=0).
> > As a fix, I overrided the function in a subclass and return 0 if the
> weight of the current feature in the current label is 0.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Reply via email to