Yeah the problem here is that all the ratings are '1', and a
correlation-based similarity metric like Pearson will return a "NaN"
for the similarity between all users as a result.

You want to take advantage of the situation by using the bits of code
that assume you are in this situation, where all the ratings are the
same or 1 or don't matter. Support for this mode is still a bit
evolving, but basically you want to:

- Use BooleanTanimotoCoefficientSimilarity instead of Pearson.
- Omit the ",1" in the data file -- in fact you need to to get this to work.
- Also separately I might generally discourage people from trying
PreferenceInferrer unless you know you need or want it; I don't really
like this technique. In fact for the similarity implementation above
it won't be supported. So just remove that line.

If any problems come up write back, might have missed a detail there.

2009/4/27 Paul Loy <[email protected]>:
> Hi,
>
> I want to create recommendations for my customers based on boolean data.
> Essencially whether they bought a product.
>
> So this will create a csv containing:
>
> acctId, itemId, 1
>
> There is an entry in the CSV for each sale. So all entries will have a
> 'rating' of 1. Using the following example:
>
>        DataModel model = new FileDataModel(new File("data.txt"));
>
>        PearsonCorrelationSimilarity userSimilarity = new
> PearsonCorrelationSimilarity(model);
>        userSimilarity.setPreferenceInferrer(new
> AveragingPreferenceInferrer(model));
>
>        UserNeighborhood neighborhood =
>            new NearestNUserNeighborhood(1, userSimilarity, model);
>
>        Recommender recommender =
>            new GenericUserBasedRecommender(model, neighborhood,
> userSimilarity);
>        Recommender cachingRecommender = new
> CachingRecommender(recommender);
>
>        List<RecommendedItem> recommendations =
>            cachingRecommender.recommend("1967128", 10);
>
>        for (RecommendedItem item : recommendations) {
>            System.out.println(item);
>        }
>
> I get 0 recommendations even when I have seeded the file with obvious
> correlations. I'm guessing this is because all 'ratings' are 1. Is there any
> way to infer that all other items have a rating of 0, thus giving the
> algorithms something to correlate?
>
> Thanks,
>
> Paul
>
>
>
> --
> ---------------------------------------------
> Paul Loy
> [email protected]
> http://www.keteracel.com/paul
>

Reply via email to