Re: GenericRecommenderIRStatsEvaluator results

Sean Owen Wed, 03 Mar 2010 01:43:45 -0800

On Wed, Mar 3, 2010 at 9:32 AM, Mirko
<[email protected]> wrote:
> I finally got some PR results from my evaluation. But I also became aware how 
> painful recommender evaluation is. I don't think that RecommenderEvaluator is 
> to be favored over PR in general. It depends on the use case of the 
> recommender. For example, in some use cases MAE doesn't say much about the 
> quality of recommendations.


I think they're answering different questions, and one or both might
or might not be useful in a given circumstance.

In a sense, evaluating the quality of predictions is slightly the
wrong question to ask. After all a recommender's primary job is to
make ordered recommendations, only. It does not necessarily need to
predict preferences to do this, though most do.

It's certainly a common concept and approach for recommenders (this
was how Netflix recommenders were scored for instance) so it's
available to you. But it isn't even applicable in all cases, like when
you have no preferences at all!


> In particular, I would be interested in what you think about novelty in the 
> context of evalutation. I think novelty is a big potential of recommenders. 
> Depending on the use case, novel recommendations are favorable over 
> non-novels. However, both MAE and PR punish novelty, if I understand 
> correctly. More novel recommendations mean worse results for MEA, and PR. Do 
> you know of offline evaluation methods that consider novelty / any literatur 
> on this issue?

Yes, this is exactly the reason I think using precision and recall
also has problems. It 'punishes' the recommender for making novel
recommendations. All it wants to see is that the recommender suggested
those items the user had already known about.

MAE doesn't really punish the recommender for this. It just can't
incorporate scores on these novel recommendations into the final
evaluation, since there is no 'real' preference to benchmark against.

I don't have a good reference for you but I think there's really one
way forward to evaluation: you need to collect data about how often
your recommended items were viewed / clicked, and how they were rated.
That is you'd really have to deploy the recommender and evaluate it
going forward. I just can't imagine any other solution since it is
necessarily based on information you don't have yet.

Re: GenericRecommenderIRStatsEvaluator results

Reply via email to