thanks a lot for the explanation. that makes sense.
srowen wrote: > > You estimate a preference for each of those items, yes, in either > user-based or item-based recommendation. In item-based recommendation, > the estimate is a weighted average -- it's the user's preferences for > various items, weighted by their similarity to the given item. > > In that case you don't need a neighborhood. The items of interest are > the user's preferred items -- and you want to use all of them, not a > subset. > > It's not quite symmetrical with user-based recommendation, which is > based on user similarity. There, you need to constrain yourself to > examine only a subset of all users, a neighborhood, or else it would > be wildly inefficient. > > But in item-based recommendation you don't have this issue. *Given an > item*, you already know the very small number of items it needs to be > compared to -- the user's preferred items. That takes the place of a > neighborhood in a sense. > > You could say, well, then the problem is elsewhere: how can > considering all possible items for recommendation be efficient? if we > use neighborhoods to get around that in user-based, why not > item-based? In fact the algorithm doesn't actually look at every item > -- it constructs a set of items that are at all connected to any item > the user prefers, in order to rule out most items that can't possibly > be recommended. > > In that sense a 'neighborhood' comes into play: the set of all items > considered is really the union of all maximal neighborhoods around any > item that the user prefers. That's a big neighborhood, and if this is > what you mean, you are correct that you could reasonably add > parameters to constrain that neighborhood. > > The reasons maybe you don't want to do that are: > > 1) Item similarity is often 'fast' in that it is sometimes precomputed > based on outside information. So sorting through a lot of potential > items doesn't hurt much. > > 2) It's not part of the canonical item-based algorithm, but that's not > a great reason. > > 3) Computing this neighborhood gets expensive: it must be defined > based on distance to all items in the set, not one. That is, being far > from or near to one item doesn't mean anything by itself. It matters > how close it is to the whole set. By the time you're computing that... > might as well just use the canonical algorithm. > > On Sat, Feb 20, 2010 at 11:22 AM, jamborta <[email protected]> wrote: >> >> but as far as I understand your implementation you take user1 and then >> get >> all the items >> that the user hasn't rated (getAllOtherItems()) and generate >> recommendation >> for each of these items. therefore, you have user1 item1, user1 item2, >> etc >> as input. so the neighbourhood can be restricted for each of these items. >> >> Tamas > > -- View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666452.html Sent from the Mahout User List mailing list archive at Nabble.com.
