Hi Sean and all, first, your book is really a helpful documentation of taste - not only as a general introduction, but also to look up things when developing or wondering about results. (And I'm not getting paid for saying this).
I finally got some PR results from my evaluation. But I also became aware how painful recommender evaluation is. I don't think that RecommenderEvaluator is to be favored over PR in general. It depends on the use case of the recommender. For example, in some use cases MAE doesn't say much about the quality of recommendations. In particular, I would be interested in what you think about novelty in the context of evalutation. I think novelty is a big potential of recommenders. Depending on the use case, novel recommendations are favorable over non-novels. However, both MAE and PR punish novelty, if I understand correctly. More novel recommendations mean worse results for MEA, and PR. Do you know of offline evaluation methods that consider novelty / any literatur on this issue? Regards, - Mirko Am 24.02.2010 um 22:09 schrieb Sean Owen: > On Wed, Feb 24, 2010 at 8:51 PM, Mirko > <[email protected]> wrote: >> Is it possible that my ratings are problematic? Below is a sample from my >> data set. My ratings are all very similar. I derived the ratings from a >> "usage score" and normalized to a value between 0-1. For example, if a user >> used an item 1 times, the normalized rating is 0.0022271716 . This is the >> case for most preferences. When a user used an item more often, the rating >> (slightly) increases. > > In some cases, this could be a problem. For example, the Pearson > correlation is undefined when one of the two data series has values > that are all the same. If you were using PearsonCorrelationSimilarity, > I'd believe this could be interfering with its ability to calculate > similarities in many cases. > > But you're using LogLikelihoodSimilarity, which actually doesn't even > look at the preference values. > > >> I thought this approach could give better results then a boolean model. >> Would you agree or is this approach weird? > > It's really hard to say. You'd think that more data is better, but > it's not always. You are actually using your rating values, just not > in the similarity computation. Sometimes throwing out the data is > better, if it's so noisy or un-representative of real preferences that > it's just doing harm. > > I'm not sure there is a good rule of thumb about when to use the > ratings or not. That's why there's a fairly robust evaluation > framework to let you just try it out on your data. > > This is also the point where we should wheel in Ted D. for comment > since he has dealt with this sort of thing for years. > > >> This is very interesting! Could you please give me the subject of this >> thread on mahout-dev? I can't find it in the archive. > > Oops the discussion I was thinking of was actually here: > http://issues.apache.org/jira/browse/MAHOUT-305 > > >> Is a Boolean DataModel one without ratings? Could I use the >> RecommenderEvaluator with a Boolean DataModel? > > Yes it is. > No, you can't use RecommenderEvaluator. The framework thinks of such a > DataModel as one where all ratings are 1.0. RecommenderEvaluator > assesses the difference between predicted and actual ratings, but that > makes no sense in a world where all ratings are always 1.0. > > Let me shamelessly plug Mahout in Action (http://manning.com/owen) > which has a good bit of coverage about evaluation. But I don't think > you're doing something wrong here that the book would clear up. > > > I don't see anything obviously wrong. What you could do, if you have a > bit of motivation, is simply step through the core evaluation method > with a debugger. After 10-20 loops I'd imagine you'll get a sense of > what's going on -- is it unable to create test data? are > recommendations empty? etc. If you can spot anything odd like that, > that would be a big clue to me. > > Then if we're still stumped perhaps I can ask you for your data, > separately, so I can investigate myself.
