One thing I can tell you is that mahout-0.2 will be significantly faster and use less memory. On one particular setup I am working on with a client, we needed 1GB heap to hold 5M ratings in memory, and needed about 1 second to generate recommendations. After recent changes, it fits in 360M and takes about 0.3s to generate a recommendation.
The catch is that most APIs changed significantly. You'll have to do some work to adapt to the new code. It is available now from Subversion, and I would welcome anyone who is willing to try it out, as it is a big change and still quite new. Do you mean it fails when the data set gets larger, or you get a return value, it just has no recommendations? That latter result would be very puzzling. Is that what you see? Beyond this, I think your system could be 'tuned' more by selecting perhaps faster, more specific implementations to use. For example, tell me about the nature of your 'ratings' in your system. In many cases, it's actually better (and much faster) to completely ignore ratings. There is support for that in the framework. On Thu, Aug 13, 2009 at 8:03 PM, mishkinf<[email protected]> wrote: > > I have been using mahout-0.1 release version and I am able to get > recommendations with datasets roughly 5 million and under but when I attempt > 10 million or so no recommendations are given to me. Has anybody had this > problem? I'm not sure if I am just using the wrong recommender > settings/recommender or if I should just switch to trunk version or > something. Ideas? Suggestions? > > I have tried item-item recommender, user-item recommenders.... nearest > neighborhood... tree clustering.. > They all produce numerous recommendations with the smaller data sets. In > theory it should only get better with a larger data set. > > Currently I'm using item-item recommender with caching item similarities and > cashing recommender.. > > ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel); > CachingItemSimilarity cis = new CachingItemSimilarity(similarity, > dataModel); > recommender = new CachingRecommender(new > GenericItemBasedRecommender(dataModel, similarity)); > > ...... > > I would like to have Mahout to work with 25-50 million rows of data but as > of yet 5 million is the best i can do. RAM has also been an issue with > larger data sets. > -- > View this message in context: > http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24956912.html > Sent from the Mahout User List mailing list archive at Nabble.com. > >
