On Wed, Feb 24, 2010 at 8:51 PM, Mirko <[email protected]> wrote: > Is it possible that my ratings are problematic? Below is a sample from my > data set. My ratings are all very similar. I derived the ratings from a > "usage score" and normalized to a value between 0-1. For example, if a user > used an item 1 times, the normalized rating is 0.0022271716 . This is the > case for most preferences. When a user used an item more often, the rating > (slightly) increases.
In some cases, this could be a problem. For example, the Pearson correlation is undefined when one of the two data series has values that are all the same. If you were using PearsonCorrelationSimilarity, I'd believe this could be interfering with its ability to calculate similarities in many cases. But you're using LogLikelihoodSimilarity, which actually doesn't even look at the preference values. > I thought this approach could give better results then a boolean model. Would > you agree or is this approach weird? It's really hard to say. You'd think that more data is better, but it's not always. You are actually using your rating values, just not in the similarity computation. Sometimes throwing out the data is better, if it's so noisy or un-representative of real preferences that it's just doing harm. I'm not sure there is a good rule of thumb about when to use the ratings or not. That's why there's a fairly robust evaluation framework to let you just try it out on your data. This is also the point where we should wheel in Ted D. for comment since he has dealt with this sort of thing for years. > This is very interesting! Could you please give me the subject of this thread > on mahout-dev? I can't find it in the archive. Oops the discussion I was thinking of was actually here: http://issues.apache.org/jira/browse/MAHOUT-305 > Is a Boolean DataModel one without ratings? Could I use the > RecommenderEvaluator with a Boolean DataModel? Yes it is. No, you can't use RecommenderEvaluator. The framework thinks of such a DataModel as one where all ratings are 1.0. RecommenderEvaluator assesses the difference between predicted and actual ratings, but that makes no sense in a world where all ratings are always 1.0. Let me shamelessly plug Mahout in Action (http://manning.com/owen) which has a good bit of coverage about evaluation. But I don't think you're doing something wrong here that the book would clear up. I don't see anything obviously wrong. What you could do, if you have a bit of motivation, is simply step through the core evaluation method with a debugger. After 10-20 loops I'd imagine you'll get a sense of what's going on -- is it unable to create test data? are recommendations empty? etc. If you can spot anything odd like that, that would be a big clue to me. Then if we're still stumped perhaps I can ask you for your data, separately, so I can investigate myself.
