[ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168389#comment-13168389 ]
Ted Dunning commented on MAHOUT-925: ------------------------------------ Reach is a nice statistic to have, but I think it can be had more simply than this. In my experience, quality of recommendations depends very strongly on the number of items in the history. Where the history is too small, recommendations will typically be pretty poor and above a threshold, they will be as good as they are going to be. For music, that threshold was 5-10 items, for video it was comparable. IF this is true, then the reach computation can be broken into two parts: a) what is the threshold? b) how many people reach the threshold? The first question is answerable by the standard precision recall measurement methods except that the resulting data need to be averaged with an awareness of the history size so that the threshold can be detected. The second question is simple arithmetic and doesn't need a framework. > Evaluate the reach of recommender algorithms > -------------------------------------------- > > Key: MAHOUT-925 > URL: https://issues.apache.org/jira/browse/MAHOUT-925 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Anatoliy Kats > Assignee: Sean Owen > Priority: Minor > Attachments: MAHOUT-925.patch, MAHOUT-925.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The evaluation of a CF algorithm should include reach, the proportion of > users for whom a recommendation could be made. An algorithm usually has a > cutoff value on the confidence of the recommender, and if it is not high > enough, no recommendation is made. The number of requested recommendations, > or this parameter could be varied as part of the evaluation. The proposed > patch adds this. > My build with this patch breaks > testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): > org.apache.mahout.classifier.df.node.Leaf.<init>(I)V . The test seems > unrelated to the patch, so I am assuming this is broken in the trunk head as > well. Unfortunately I am under a deadline, and I do not have time to write > tests for the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira