feature requests regarding GenericItembasedRecommender

Sebastian Schelter Fri, 03 Dec 2010 06:53:13 -0800

Hi,

I ran into some issues with GenericItembasedRecommender this week, whichI could only work around by creating a custom ItembasedRecommenderimplementation. I think the issues might be worth discussing here andI'd look forward to committing back my changes if we find them useful.

The first issue is withGenericItembasedRecommender.MultiMostSimilarEstimator, which is used tocompute the most similar items to a collection of items. The currentimplementation filters out all items that are not similar (having NaN assimilarity value) to at least one of the input items. While this mightbe algorithmically correct it very often leads to empty results. Usersmight e.g. put very different things in a shopping cart and using thosethings as input for mostSimilarItems produces empty results in lots ofcases in my experience. My workaround was to interpret NaN as 0 whencomputing the average estimate here (and in the end filtering outresults that had 0 as average), thus allowing an item to be included inthe result if it is similar to at least one of the input items. If wedecide to include this we could either introduce a secondmostSimilarItems method or make it receive a parameter to determine the"exclusion mode" or whatever we might call it.

The second issue is a little bit more complicated. A while ago weintroduced an component called CandidateItemsStrategy to enable thecustomization of the selection of the initial candidate items that mightbe recommended to a user. I noticed that we actually should do the samething with the selection of candidate items for mostSimilarItems, whichis currently done inGenericItembasedRecommender.doMostSimilarItems(...). This especiallywastes CPU time when we use precomputed similarities(GenericItemSimilarity or FileItemSimilarity) because we already "know"the possibly similar items. Unfortunately there's no way to askItemSimilarity to directly give you all similar items to an item (whichwould be very the most efficient way of use when dealing with alreadyprecomputed similarities). I created a small file-based indexingcomponent which can be asked for those but I'm not to happy withspreading the information about the precomputed similarities. Though Ithink we should work on improving the efficiency here as it turned outto be a performance killer in my usecase.

I hope I can make it clear what the problems were (and what solutions Ipropose). I could also supply a patch in the next weeks but I wanted tohave a discussion first.


--sebastian

feature requests regarding GenericItembasedRecommender

Reply via email to