Hi!

I'm experimenting with using the Mahout library's Taste implementation to provide product recommendations for users as well as identifying similar items. The data set is past sales - essentially just a boolean relationship "customer X brought item Y". To get something simple working - I can optimize and improve later - I just used the file data model; my file looks like..

438356039,46305
438356039,46339
438386087,56304
<another 1.5 million or so entries here>

I then create a recommender like:

DataModel Model = new FileDataModel(Path);
ItemSimilarity SimilarityForItems = new PearsonCorrelationSimilarity(Model);
ItemBasedRecommender Item = new GenericItemBasedRecommender(Model, SimilarityForItems);

And then do:

List<RecommendedItem> Recommended = Item.mostSimilarItems(ItemID, HowMany);

However, no results are returned. I went digging for why, and wound up finding that the itemSimilarity method in AbstractSimilarity was always consistently returning NaN. Looking for why, I found that it did indeed find places where both users expressed a preference for an item, however when computing the various centered sums they all came out to zero; computeResult then always gives back NaN. If I comment out the call to computeResult and instead replace it with one using the non-centered sums:

//double result = computeResult(count, centeredSumXY, centeredSumX2, centeredSumY2, sumXYdiff2);
   double result = computeResult(count, sumXY, sumX2, sumY2, sumXYdiff2);

Then I do get results; a similar hack in userSimilarity gives back results from .recommend too.

My guess is that I'm more likely to be doing something wrong in how I'm using Mahout rather than that I've stumbled on a bug, and naturally I'd rather use the library "as it comes" rather than a patched version. :-) However, I'm not sure what I'm doing wrong, and I'm also decidedly not an expert in this field so I'm not familiar with the details of the computations being done here. Any thoughts on where I'm going wrong would be welcomed. If it helps to know, I'm using the latest (0.2) release.

Many thanks for any insight,

Jonathan

Reply via email to