[ 
https://issues.apache.org/jira/browse/MAHOUT-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994030#comment-12994030
 ] 

Robin Swezey edited comment on MAHOUT-605 at 2/13/11 8:39 AM:
--------------------------------------------------------------

Robin, Sean

This is Robin S.

We thank you for your quick answer and your reactivity.

Let me detail why my professor asked this question. In 
org.apache.mahout.classifier.bayes.algorithm.CBayesAlgorithm, which we use to 
output a class scalar or class aray for the most probable prefecture given a 
news article, the following code is written under public ClassifierResult[] 
classifyDocument(String[] document, Datastore datastore, String 
defaultCategory, int numResults):

for (String category : categories) {
     double prob = documentWeight(datastore, category, document);

However, we conducted a diff on the documentWeight method called, between the 
CNB version and its NB counterpart, and they are identical. This is also the 
case for public ClassifierResult[] classifyDocument(String[] document, 
Datastore datastore, String defaultCategory, int numResults).

The methods to sort documents are identical in the NB and CNB Algorithm 
classes, the Collections.reverse(result); line is present in both versions.

Are the weights different types of weights? One of the drivers for NB/CNB 
training seems to differ in the case of CNB training (Complementary Bayes Theta 
Normalizer Driver), is there some relation to this? Then why the need to sort 
it ascendant? (which is done in both cases)

This portion of the code looks a little confusing, hence our question.

We thank you again for your reactivity.

Robin S.

      was (Author: mizudera):
    Robin, Sean

This is Robin S.

We thank you for your quick answer and your reactivity.

Let me detail why my professor asked this question. In 
org.apache.mahout.classifier.bayes.algorithm.CBayesAlgorithm, which we use to 
output a class scalar for the most probable prefecture given a news article, 
the following code is written under public ClassifierResult[] 
classifyDocument(String[] document, Datastore datastore, String 
defaultCategory, int numResults):

for (String category : categories) {
     double prob = documentWeight(datastore, category, document);

However, we conducted a diff on the documentWeight method called, between the 
CNB version and its NB counterpart, and they are identical. This is also the 
case for public ClassifierResult[] classifyDocument(String[] document, 
Datastore datastore, String defaultCategory, int numResults).

The methods to sort documents are identical in the NB and CNB Algorithm 
classes, the Collections.reverse(result); line is present in both versions.

Are the weights different types of weights? One of the drivers for NB/CNB 
training seems to differ in the case of CNB training (Complementary Bayes Theta 
Normalizer Driver), is there some relation to this? Then why the need to sort 
it ascendant? (which is done in both cases)

This portion of the code looks a little confusing, hence our question.

We thank you again for your reactivity.

Robin S.
  
> Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument 
> is sorted ascendant
> -------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-605
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-605
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>         Environment: Linux
>            Reporter: Robin Swezey
>            Assignee: Robin Anil
>            Priority: Minor
>              Labels: bayesian, classification
>             Fix For: 0.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The array returned for a n-best call to classifyDocument is sorted ascendant 
> instead of descendant. 
> Ex:
> {quote}
> 47-best: [ClassifierResult\{category='香川県', score=32.28281232047167\},
> ClassifierResult\{category='宮崎県', score=32.28969992600906\}, ......,
> ClassifierResult\{category='愛知県', score=32.487981016587796\},
> ClassifierResult\{category='東京都', score=32.49189358054859\},
> ClassifierResult\{category='北海道', score=32.49811200756193\}]
> {quote}
> (classification of documents for Japanese prefectures)
> Inside the classifyDocument method, just before the return statement we found 
> this line:
> {quote}
> Collections.reverse(result);
> {quote}
> Is this a mistake or a design choice? (we are not sure, hence the "Minor" 
> priority)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to