Yeah the problem here is that all the ratings are '1', and a correlation-based similarity metric like Pearson will return a "NaN" for the similarity between all users as a result.
You want to take advantage of the situation by using the bits of code that assume you are in this situation, where all the ratings are the same or 1 or don't matter. Support for this mode is still a bit evolving, but basically you want to: - Use BooleanTanimotoCoefficientSimilarity instead of Pearson. - Omit the ",1" in the data file -- in fact you need to to get this to work. - Also separately I might generally discourage people from trying PreferenceInferrer unless you know you need or want it; I don't really like this technique. In fact for the similarity implementation above it won't be supported. So just remove that line. If any problems come up write back, might have missed a detail there. 2009/4/27 Paul Loy <[email protected]>: > Hi, > > I want to create recommendations for my customers based on boolean data. > Essencially whether they bought a product. > > So this will create a csv containing: > > acctId, itemId, 1 > > There is an entry in the CSV for each sale. So all entries will have a > 'rating' of 1. Using the following example: > > DataModel model = new FileDataModel(new File("data.txt")); > > PearsonCorrelationSimilarity userSimilarity = new > PearsonCorrelationSimilarity(model); > userSimilarity.setPreferenceInferrer(new > AveragingPreferenceInferrer(model)); > > UserNeighborhood neighborhood = > new NearestNUserNeighborhood(1, userSimilarity, model); > > Recommender recommender = > new GenericUserBasedRecommender(model, neighborhood, > userSimilarity); > Recommender cachingRecommender = new > CachingRecommender(recommender); > > List<RecommendedItem> recommendations = > cachingRecommender.recommend("1967128", 10); > > for (RecommendedItem item : recommendations) { > System.out.println(item); > } > > I get 0 recommendations even when I have seeded the file with obvious > correlations. I'm guessing this is because all 'ratings' are 1. Is there any > way to infer that all other items have a rating of 0, thus giving the > algorithms something to correlate? > > Thanks, > > Paul > > > > -- > --------------------------------------------- > Paul Loy > [email protected] > http://www.keteracel.com/paul >
