To reduce the time to recommend while producing online recommendations, here are steps I do for a very large dataset:
1. I compute item-item similarities (for all item pairs who have been rated by at least one common user), and after some optimizations (like content boosting) I store k most similar users with degree of similarity for each item. 2. While recommendation time, the system takes a user history vector, which does not need to be one of the users in the dataset, as an input. 3. The algorithm looks all items in the input vector, fetches most similar items from 1. If one of the most similar items of an item in the user history are not rated by the user, it is added to recommendation list. 4. The list is sorted and top n elements are recommended. Computing rating for a specific item is also computed with a similar way. Also, if an item belongs to most similar items of more than one item in the user history, the possibility to recommend this item is higher. If you mean a system like this, I should say implementation is mostly done via Mahout. 1st step is computed by using mostSimilarItems function. Other steps are not from Mahout, but they are easy to implement. On Sat, Feb 20, 2010 at 9:46 PM, Ted Dunning <[email protected]> wrote: > This is just one of an infinite number of variations on item-based > recommendation. The general idea is that you do some kind of magic to find > item-item connections, you trim those to make it all work and then you > recommend the items linked from the user's history of items they liked. If > the budget runs out (time, space or $), then you trim more. All that the > grouplens guys are saying is that trimming didn't hurt accuracy so it is > probably good to do. > > The off-line connection finding can be done using LLR (for moderately high > traffic situations), SVD (for cases where transitive dependencies are > important), random indexing (poor man's SVD) or LDA (where small counts > make > SVD give crazy results). There are many other possibilities as well. > > It would be great if you felt an itch to implement some of these and > decided > to scratch it and contribute the results back to Mahout. > > On Sat, Feb 20, 2010 at 6:46 AM, jamborta <[email protected]> wrote: > > > > > the basic concept of neighbourhood for item-based recommendation comes > from > > this paper: > > > > http://portal.acm.org/citation.cfm?id=371920.372071 > > > > this is the idea: > > > > "The fact that we only need a small fraction of similar items to compute > > predictions leads us to an alternate model-based scheme. In this scheme, > we > > retain only a small number of similar items. For each item j we compute > the > > k most similar items. We term k as the model size. Based on this model > > building step, our prediction generation algorithm works as follows. For > > generating predictions for a user u on item i, our algorithm first > > retrieves the precomputed k most similar items corresponding to the > target > > item i. Then it looks how many of those k items were purchased by the > user > > u, based on this intersection then the prediction is computed using basic > > item-based collaborative filtering algorithm." > > > > -- > > View this message in context: > > > http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html > > Sent from the Mahout User List mailing list archive at Nabble.com. > > > > > > > -- > Ted Dunning, CTO > DeepDyve > -- Gökhan Çapan
