Those are inconclusive as its testing only a few strings. Slight change in weights can throw off things. I put it as a sanity check to ensure noone changes the code drastically without others knowing about it.
Can you run the 20newsgroups example and print the confusion matrix. If that didnt change, I dont think we need to look too much into the test. We can stuff more words and change the weight. On Thu, Oct 6, 2011 at 1:50 AM, Isabel Drost (Updated) (JIRA) < [email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Isabel Drost updated MAHOUT-826: > -------------------------------- > > Attachment: mahout-826.patch > > I've taken a quick look at the patch - technically looks good. Added a test > case for each modification and reformatted slightly. > > Concerning the test failure: I see the same. The change seems to cause a > different output when testing the "BayesAlgorithm" - output remains the same > for the "CBayesAlgorithm". > > Robin, do you have any quick explanation for that? Otherwise we might have > to dig a bit deeper. > > > Bayes/CBayes classification on a non-existing feature > > ----------------------------------------------------- > > > > Key: MAHOUT-826 > > URL: https://issues.apache.org/jira/browse/MAHOUT-826 > > Project: Mahout > > Issue Type: Bug > > Components: Classification > > Reporter: Andre-Philippe Paquet > > Priority: Minor > > Attachments: mahout-826.patch, mahout-826.patch > > > > > > (see http://comments.gmane.org/gmane.comp.apache.mahout.user/9597) > > Using CBayes or Bayes, when trying to classify a feature/word that > doesn't exist in the model, instead of returning the default/unknown label, > the algorithm returns all labels with a constant score (ex: > 12.386649147018964). After a quick look in CBayesAlgorithm, I found the > problem in the featureWeight function that returns the theta normalized > weight even if the feature didn't have any match (result=0). > > As a fix, I overrided the function in a subclass and return 0 if the > weight of the current feature in the current label is 0. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
