[
https://issues.apache.org/jira/browse/MAHOUT-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994046#comment-12994046
]
Robin Swezey commented on MAHOUT-605:
-------------------------------------
Robin
Thank you for your very quick answer. As stated in the example of the very
first post, we have 47 classes (Japanese prefectures). But we want to use it on
more than 1700 classes (Japanese cities), hence the need for CNB because the Ja
Wikipedia corpus does not give a lot of information on small cities.
I have a paper in review which uses this feature of Mahout and explains in more
detail, in case you need it.
If there is a mistake in the code, this could help explain the current
efficiency of our classifier, which is not really good as of the moment.
> Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument
> is sorted ascendant
> -------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-605
> URL: https://issues.apache.org/jira/browse/MAHOUT-605
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.4
> Environment: Linux
> Reporter: Robin Swezey
> Assignee: Robin Anil
> Priority: Minor
> Labels: bayesian, classification
> Fix For: 0.5
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The array returned for a n-best call to classifyDocument is sorted ascendant
> instead of descendant.
> Ex:
> {quote}
> 47-best: [ClassifierResult\{category='香川県', score=32.28281232047167\},
> ClassifierResult\{category='宮崎県', score=32.28969992600906\}, ......,
> ClassifierResult\{category='愛知県', score=32.487981016587796\},
> ClassifierResult\{category='東京都', score=32.49189358054859\},
> ClassifierResult\{category='北海道', score=32.49811200756193\}]
> {quote}
> (classification of documents for Japanese prefectures)
> Inside the classifyDocument method, just before the return statement we found
> this line:
> {quote}
> Collections.reverse(result);
> {quote}
> Is this a mistake or a design choice? (we are not sure, hence the "Minor"
> priority)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira