[ 
https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168204#comment-13168204
 ] 

Anatoliy Kats commented on MAHOUT-925:
--------------------------------------

That's a good point, we should be careful about how we analyze undersampled 
data.  The purpose of measuring reach is to predict what percentage of the 
audience in a production system will get a required number of recommendations.  
Actually I think the easiest way to do this is to loop over the users, and try 
to generate recommendation on the model that does not exclude any preferences.

Also, in the spirit of creating conditions maximally similar to a production 
environment, it seems unfair to exclude users because the evaluator judges 
there are not enough preferences remaining (lines 116-118 in the patched code). 
 The recommender should decide for itself whether or not to generate anything.  
Only if it refuses to generate the required number of recommendations do we 
exclude the user from the IR statistics.  This kind of a change would always 
make precision and recall equal.  They often are in practice.  What was the 
original motivation for including both statistics?
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of 
> users for whom a recommendation could be made.  An algorithm usually has a 
> cutoff value on the confidence of the recommender, and if it is not high 
> enough, no recommendation is made.  The number of requested recommendations, 
> or this parameter could be varied as part of the evaluation.  The proposed 
> patch adds this.
> My build with this patch breaks 
> testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest):
>  org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems 
> unrelated to the patch, so I am assuming this is broken in the trunk head as 
> well.  Unfortunately I am under a deadline, and I do not have time to write 
> tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to