On Wed, Mar 3, 2010 at 9:32 AM, Mirko <[email protected]> wrote: > I finally got some PR results from my evaluation. But I also became aware how > painful recommender evaluation is. I don't think that RecommenderEvaluator is > to be favored over PR in general. It depends on the use case of the > recommender. For example, in some use cases MAE doesn't say much about the > quality of recommendations.
I think they're answering different questions, and one or both might or might not be useful in a given circumstance. In a sense, evaluating the quality of predictions is slightly the wrong question to ask. After all a recommender's primary job is to make ordered recommendations, only. It does not necessarily need to predict preferences to do this, though most do. It's certainly a common concept and approach for recommenders (this was how Netflix recommenders were scored for instance) so it's available to you. But it isn't even applicable in all cases, like when you have no preferences at all! > In particular, I would be interested in what you think about novelty in the > context of evalutation. I think novelty is a big potential of recommenders. > Depending on the use case, novel recommendations are favorable over > non-novels. However, both MAE and PR punish novelty, if I understand > correctly. More novel recommendations mean worse results for MEA, and PR. Do > you know of offline evaluation methods that consider novelty / any literatur > on this issue? Yes, this is exactly the reason I think using precision and recall also has problems. It 'punishes' the recommender for making novel recommendations. All it wants to see is that the recommender suggested those items the user had already known about. MAE doesn't really punish the recommender for this. It just can't incorporate scores on these novel recommendations into the final evaluation, since there is no 'real' preference to benchmark against. I don't have a good reference for you but I think there's really one way forward to evaluation: you need to collect data about how often your recommended items were viewed / clicked, and how they were rated. That is you'd really have to deploy the recommender and evaluate it going forward. I just can't imagine any other solution since it is necessarily based on information you don't have yet.
