[
https://issues.apache.org/jira/browse/MAHOUT-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003114#comment-13003114
]
Robin Swezey commented on MAHOUT-605:
-------------------------------------
Hello
On the mailing-list Robin A said that the weights are to be considered as
negative numbers so this solves the issue of knowing if we have CNB or not,
this is CNB alright.
I have 2 very quick questions before the issue is resolved:
1/ This implementation is a TWCNB, am I not mistaken?
2/ If we re-run the SinglyClassifier2 example of my previous post
(http://pastebin.com/VMUVGmUd lines 57-61), but limit the number of results to
3 like this:
bq. String[] doc = {"mspublisher", "parallax", "polaroid", "corel",
"illustrator", "coreldraw"};
bq. SinglyClassifier2 sc = new SinglyClassifier2();
bq. List<ClassifierResult> results = sc.classifyDocument(doc, "comp.graphics",
3);
Then we get as output:
bq. [ClassifierResult{category='sci.med', score=71.12823038989241},
ClassifierResult{category='talk.politics.mideast', score=71.12905966433597},
ClassifierResult{category='sci.crypt', score=71.13190725486677}]
So this is not a N-best output, but actually a N-worst. Hence, there still
might be a problem with this line in CBayesAlgorithm.classifyDocument(String[]
document, Datastore datastore, String defaultCategory, int numResults):
bq. Collections.reverse(result);
> Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument
> is sorted ascendant
> -------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-605
> URL: https://issues.apache.org/jira/browse/MAHOUT-605
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.4
> Environment: Linux
> Reporter: Robin Swezey
> Assignee: Robin Anil
> Priority: Minor
> Labels: bayesian, classification
> Fix For: 0.5
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The array returned for a n-best call to classifyDocument is sorted ascendant
> instead of descendant.
> Ex:
> {quote}
> 47-best: [ClassifierResult\{category='香川県', score=32.28281232047167\},
> ClassifierResult\{category='宮崎県', score=32.28969992600906\}, ......,
> ClassifierResult\{category='愛知県', score=32.487981016587796\},
> ClassifierResult\{category='東京都', score=32.49189358054859\},
> ClassifierResult\{category='北海道', score=32.49811200756193\}]
> {quote}
> (classification of documents for Japanese prefectures)
> Inside the classifyDocument method, just before the return statement we found
> this line:
> {quote}
> Collections.reverse(result);
> {quote}
> Is this a mistake or a design choice? (we are not sure, hence the "Minor"
> priority)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira