[ 
https://issues.apache.org/jira/browse/MAHOUT-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003114#comment-13003114
 ] 

Robin Swezey commented on MAHOUT-605:
-------------------------------------

Hello

On the mailing-list Robin A said that the weights are to be considered as 
negative numbers so this solves the issue of knowing if we have CNB or not, 
this is CNB alright.

I have 2 very quick questions before the issue is resolved:

1/ This implementation is a TWCNB, am I not mistaken?

2/ If we re-run the SinglyClassifier2 example of my previous post 
(http://pastebin.com/VMUVGmUd lines 57-61), but limit the number of results to 
3 like this:

bq. String[] doc = {"mspublisher", "parallax", "polaroid", "corel", 
"illustrator", "coreldraw"};
bq. SinglyClassifier2 sc = new SinglyClassifier2();
bq. List<ClassifierResult> results = sc.classifyDocument(doc, "comp.graphics", 
3);

Then we get as output:

bq. [ClassifierResult{category='sci.med', score=71.12823038989241}, 
ClassifierResult{category='talk.politics.mideast', score=71.12905966433597}, 
ClassifierResult{category='sci.crypt', score=71.13190725486677}]

So this is not a N-best output, but actually a N-worst. Hence, there still 
might be a problem with this line in CBayesAlgorithm.classifyDocument(String[] 
document, Datastore datastore, String defaultCategory, int numResults):

bq. Collections.reverse(result);

> Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument 
> is sorted ascendant
> -------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-605
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-605
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>         Environment: Linux
>            Reporter: Robin Swezey
>            Assignee: Robin Anil
>            Priority: Minor
>              Labels: bayesian, classification
>             Fix For: 0.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The array returned for a n-best call to classifyDocument is sorted ascendant 
> instead of descendant. 
> Ex:
> {quote}
> 47-best: [ClassifierResult\{category='香川県', score=32.28281232047167\},
> ClassifierResult\{category='宮崎県', score=32.28969992600906\}, ......,
> ClassifierResult\{category='愛知県', score=32.487981016587796\},
> ClassifierResult\{category='東京都', score=32.49189358054859\},
> ClassifierResult\{category='北海道', score=32.49811200756193\}]
> {quote}
> (classification of documents for Japanese prefectures)
> Inside the classifyDocument method, just before the return statement we found 
> this line:
> {quote}
> Collections.reverse(result);
> {quote}
> Is this a mistake or a design choice? (we are not sure, hence the "Minor" 
> priority)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to